yanboliang / spark-vlbfgs

Vector-free L-BFGS implementation for Spark MLlib
Apache License 2.0
46 stars 17 forks source link

Optimize mapJoinPartitions by shuffle dependency #27

Closed WeichenXu123 closed 7 years ago

WeichenXu123 commented 7 years ago

Optimize mapJoinPartitions, change RDD2 into shuffle dependency. So that RDD2 partitions won't be re-serialization and re-computation when iterating.

Keep old codes and add a new class MapJoinPartitionsRDDV2, and add java option vflbfgs.mapJoinPartitions.shuffleRdd2, when this option set true, will use new implementation, else use the old one.