yanboliang / spark-vlbfgs

Vector-free L-BFGS implementation for Spark MLlib
Apache License 2.0
46 stars 17 forks source link

Fix several bottlenecks in data pre-processing and some other improvements #30

Closed WeichenXu123 closed 7 years ago

WeichenXu123 commented 7 years ago
  1. Improve feature block generation efficiency, add parameter generatingFeatureMatrixBuffer.
  2. Improve feature summarizing computing, reduce memory cost and avoid OOM.
  3. Improve standardized features block computing. Reduce memory cost and shuffling data size.
  4. add parameter rowPartitionSplitNumOnGeneratingFeatureMatrix, used to increase feature block goupby reducer number (it will help avoid possible OOM in some cases)
  5. Other minor updates and examples updates.