h2oai / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
Other
44 stars 26 forks source link

Adds new way of defining DMatrix using off-heap memory populated in Java #51

Closed michalkurka closed 6 years ago

michalkurka commented 6 years ago

We can avoid using "2D" constructors in DMatrix by using off-heap memory directly. 2D constructors have a lot of overhead associated with the flatten operation.

For large matrices, it can be an issue that the matrix is in native memory 2x and also 2 additional times in java memory (DKV and arrays).

This change will allows H2O to have the matrix represented in native memory just once and once in DKV.

Example use: https://github.com/h2oai/h2o-3/pull/2822