snuspl / cruise

Cruise: A Distributed Machine Learning Framework with Automatic System Configuration
Apache License 2.0
26 stars 2 forks source link

Performance comparison between Dolphin and other frameworks #265

Open yunseong opened 8 years ago

yunseong commented 8 years ago

We've run experiments by using same LR algorithm with URL reputation dataset on multiple frameworks: Dolphin, Vortex. As @beomyeol mentioned at the meeting, we've seen some performance issues such as vector computation, data loading, etc. We can also take a look at Spark because it can run LR algorithm and the performance turned out to be much faster than Vortex (not sure compared to Dolphin yet).

This issue aims to investigate the performance of both frameworks as we can run the same algorithm on the same data set. It would be great if we can find some points to improve in performance.

As a first step, I'll run the experiment on Microsoft YARN cluster which consists of 20 machines (8core CPU, 8GB RAM, YARN 2.7.1).

yunseong commented 8 years ago

@jsjason, @johnyangk, @beomyeol , @gyeongin and I had a discussion and it'd be good to check followings:

0-th iteration accuracy: 0%
1-th iteration accuracy: 65.96%
2-th iteration accuracy: 65.96%
3-th iteration accuracy: 65.96%

The result were almost same (similar accuracy value, fixed after "1-th iteration"(2nd actually)) in Vortex as I referred most part of the algorithm from Dolphin's. Even when using the full data set, the result was still similar. Spark, on the other hand, works differently: 1) the accuracy grows as iteration goes, and 2) the ultimate accuracy is higher with the same iteration. It'd be worth taking a look at algorithm for correctness.

If you have more things to want to check or ask, please feel free to add. After that, I think this issue can be split into multiple items.

gyeongin commented 8 years ago

I pushed a branch named gy-lr-test which uses scala library breeze instead of mahout. I made some changes to save memory and improve performance. With the entire URL reputation dataset and R730-02 machine(48core cpu, 128GB memory), the job took 263 seconds. This is the command I used for test: ./run_logistic.sh -dim 3231961 -maxIter 20 -stepSize 1.0 -lambda 0.01 -local false -split 8 -input /total -output output_logistic -maxNumEvalLocal 5 -isDense false -evalSize 1000 -timeout 1200000 The final model accuracy was 93.421%.

yunseong commented 8 years ago

@gyeongin Thanks for sharing the result. This looks awesome! Could you kindly give us a very short summary what changes you made for better performance and memory usage?

johnyangk commented 8 years ago

@gyeongin Great!

gyeongin commented 8 years ago

Changes I made: