Open geoHeil opened 7 years ago
Any conclusion here?
Looks like benchmark results posted in the README.md file is quite misleading, they claim that current JVM version is few orders of magnitude faster than xgboost4j, and if you would run benchmark you will be able to get similar results. However, if you will dig deeper you would figure out that most of the time xgboost4j spend on creating DMatrix object - which is not in sparse format (by default) and has huge size: 100x100000. I believe that using sparse matrix format would boost performance. I've checked benchmark with DMatrix of size 80x100 - more suitable for my case and performance of xgboost4j was better (30-40% faster).
I have made a benchmark on some of the different libraries available, among them XGBoost4j and XGBoost-Predictor, you can take a look here if you are interested.
Did you know about https://github.com/dmlc/xgboost/issues/1849#issuecomment-266716752
Apparently xgboost4j is quicker for batch predictions in the current version than this library. Do you have a test which compares predicting a single new value and not 200k values? As described in the linked xgboost issue xgboost4j,s api is only supporting batch mode. What about your library?