quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

Run profiler #18

Closed magsol closed 8 years ago

magsol commented 8 years ago

There are a lot of Python-based program profilers we can use to benchmark the performance of the Python port. Among them:

We should definitely make use of one or more of these.

MOJTABAFA commented 8 years ago

@magsol Actually I'ved checked the program with following instruction with cProfile : python -m cProfile DictLearningPythonclean.py -i S.txt -d D.txt -o Z.txt -s Su.txt -l 100 -P 5 -n .2 -m 5 -e 0.01 and the results are as follows , but I dont know what is important in this result and how can we interprete them ? testcprofiler.txt

MOJTABAFA commented 8 years ago

By the way I also installed the objgraph , graphviz and xdot , but as I found the objgraph mostly deals with memory and not the speed. I also tried to install line profiler but always I encountered some errors and will check it again.

magsol commented 8 years ago

Excellent. I'd love to hear more about this.

Have you tried pycallgraph? On Wed, Dec 2, 2015 at 18:08 MOJTABAFA notifications@github.com wrote:

By the way I also installed the objgraph , graphviz and xdot , but as I found the objgraph mostly deals with memory and not the speed. I also tried to install line profiler but always I encountered some errors and will check it again.

— Reply to this email directly or view it on GitHub https://github.com/quinngroup/pyspark-dictlearning/issues/18#issuecomment-161462540 .

MOJTABAFA commented 8 years ago

Actually I'm working on it now , as soon as getting any results, I'll inform you for sure.

MOJTABAFA commented 8 years ago

@magsol When I'm trying to install the RunSnakeRun through Anaconda console with "pip install runsnakerun" the following error is appeared : Activating environment "C:\Anaconda3"...

[Anaconda3] C:\Users\Mojtaba Fazli>pip install runsnakerun Collecting runsnakerun Downloading RunSnakeRun-2.0.4.tar.gz (447kB) 100% |################################| 450kB 201kB/s Complete output from command python setup.py egg_info: Traceback (most recent call last): File "", line 20, in File "C:\Users\MOJTAB~1\AppData\Local\Temp\pip-build-alazkeie\runsnakerun\setup.py", line 12 except ImportError, err: ^ SyntaxError: invalid syntax

Command "python setup.py egg_info" failed with error code 1 in C:\Users\MOJTAB~1\AppData\Local\Temp\pip-build-alazkeie\runsnakerun

could u please help me what should I do ?

magsol commented 8 years ago

My first thought is that it may not be compatible with Python 3. Google the error message and check the runsnakerun website for dependency requirements. On Thu, Dec 3, 2015 at 16:40 MOJTABAFA notifications@github.com wrote:

@magsol https://github.com/magsol When I'm trying to install the RunSnakeRun through Anaconda console with "pip install runsnakerun" the following error is appeared : Activating environment "C:\Anaconda3"...

[Anaconda3] C:\Users\Mojtaba Fazli>pip install runsnakerun Collecting runsnakerun Downloading RunSnakeRun-2.0.4.tar.gz (447kB) 100% |################################| 450kB 201kB/s Complete output from command python setup.py egg_info: Traceback (most recent call last): File "", line 20, in File "C:\Users\MOJTAB~1\AppData\Local\Temp\pip-build-alazkeie\runsnakerun\setup.py", line 12 except ImportError, err: ^ SyntaxError: invalid syntax

could u please help me what should I do ?

Command "python setup.py egg_info" failed with error code 1 in C:\Users\MOJTAB~1\AppData\Local\Temp\pip-build-alazkeie\runsnakerun

[Anaconda3] C:\Users\Mojtaba Fazli>

— Reply to this email directly or view it on GitHub https://github.com/quinngroup/pyspark-dictlearning/issues/18#issuecomment-161792745 .

MOJTABAFA commented 8 years ago

Thanks, I checked somewhere, they said that anaconda only works with 64 bit windows( such as my windows), while runsnakerun is compatible with 32bit windows. hence I created the snakerun environment and there I installed the Runsnakerun and it works now! Thanks

MOJTABAFA commented 8 years ago

@magsol by the way I already ran following instructions and created the visual test.profile : python -m cProfile -o test.profile dictlearningpythonclean.py -i ....

the out put is as follows : dictlearningpythonclean2

but still I dont know where should we look in this file ? which items are important and where should we apply the enhancement in program?

MOJTABAFA commented 8 years ago

@magsol Finally I could install the pycallgraph and the output of that for our program is as follows : pycallgraph

but still my previous question is remained. Please help me

magsol commented 8 years ago

These are really neat visualizations; this will help immensely.

The point of most interest that we're looking at is runtime: are there any bottlenecks in the code where we spend a disproportionate amount of time spinning on the CPU? Judging from your first figure, and from the text file you posted earlier in this thread, nothing jumps out. However, that's likely because the entire program takes < 1s to run; that makes it hard to see if there are any bottlenecks.

What we'll need is a much larger dataset. @LindberghLi , can you help us out with that?

MOJTABAFA commented 8 years ago

@magsol after elimination of op_vc... function now the cProfile output and pycallgraph are changed to : after optimization test.txt pycallgraph

magsol commented 8 years ago

It looks like the pycallgraph layouts are a little more simplified. But did you see the first line of the two cProfile results??

Before elimination:

         92643 function calls (90508 primitive calls) in 0.726 seconds

After elimination:

        51709 function calls (49599 primitive calls) in 0.246 seconds

Wow!!! A 66% improvement in runtime on a program that runs in < 1s is spectacular.

MOJTABAFA commented 8 years ago

Thanks that was awesome, but I thinks it was also result of eliminating redundant imported libraries and useless variables too. Thanks for such a valuable lessons!

XiangLi-Shaun commented 8 years ago

The "much larger dataset" could be found at

https://github.com/quinngroup/pyspark-dictlearning/blob/master/testDataLocation.txt

MOJTABAFA commented 8 years ago

@magsol after running the program with large data size the cProfile and pycallgrah changed as follows, However, I dont know what the problem with runsnake is that is not able to give the graph to me. cProfile test 10.txt pycallgraph

However, Snakeviz output could be as follows, you can see more details in online created link for our program out put there :http://127.0.0.1:8080/snakeviz/C%3A%5CAnaconda3%5CLib%5Csite-packages%5Cdlpc.profile image

MOJTABAFA commented 8 years ago

@magsol : code Sorry I made a mistake ! all above data and figures belong to previous version of our code. The results after modifications are as follows. : cProfile test 12opt.txt pycallgraph12 http://127.0.0.1:8080/snakeviz/C%3A%5CAnaconda3%5CLib%5Csite-packages%5Cnt.profile image