pupil-labs / pupil

Open source eye tracking
https://pupil-labs.com
GNU Lesser General Public License v3.0
1.49k stars 679 forks source link

Parallelize #52

Closed cpicanco closed 9 years ago

cpicanco commented 9 years ago

Hey guys,

What about take full advantage of multi-core CPUs? Please, make me know if Pupil have already done that.

willpatera commented 9 years ago

Pupil Capture uses separate processes for eye and world. The first few lines of capture main.py are where the multiprocessing module or billiard (if the system is MacOS) are imported. This allows eye and world to run independently (in parallel) at their own rates. Communication between processes and more explanation on the wiki.

Pupil Player makes use of processes for exporting visualizations. See export launcher and batch exporter plugins. This enables one can continue interacting with player and create new visualizations (in parallel) with writing visualizations.

Take a look at your system monitor (or Activity Monitor on MacOS) to see how your CPU is utilized and changes when using Pupil Capture and Pupil Player.

cpicanco commented 9 years ago

Hey Will, I have been using Pupil with an i3 intel that have 4 virtual cores (two phisical cores i think). I could provide you more detailed information if you wish. I am aware of the multithread implementation just described. I am not sure what I should expect in the multi core level though. On Feb 13, 2015 4:23 AM, "Will" notifications@github.com wrote:

Pupil Capture uses separate processes for eye and world. The first few lines of capture main.py https://github.com/pupil-labs/pupil/blob/master/pupil_src/capture/main.py#L14-L21 are where the multiprocessing module or billiard (if the system is MacOS) are imported. This allows eye and world to run independently (in parallel) at their own rates. Communication between processes and more explanation on the wiki https://github.com/pupil-labs/pupil/wiki/Development-Overview#inter-process-communication .

Pupil Player makes use of processes for exporting visualizations. See export launcher https://github.com/pupil-labs/pupil/blob/master/pupil_src/player/export_launcher.py and batch exporter https://github.com/pupil-labs/pupil/blob/master/pupil_src/player/batch_exporter.py plugins. This enables one can continue interacting with player and create new visualizations (in parallel) with writing visualizations.

Take a look at your system monitor (or Activity Monitor on MacOS) to see how your CPU is utilized and changes when using Pupil Capture and Pupil Player.

Reply to this email directly or view it on GitHub https://github.com/pupil-labs/pupil/issues/52#issuecomment-74210578.

mkassner commented 9 years ago

Hi Rafael,

If you have 2 physical core (4 ht cores) The current implementation will be able to utilise all of your CPU.

Both processes, eye and world use about equal amounts of CPU when recording.

Python itself has a hard time multithreading due to the* Global Interpreter Lock* but the libraries we use (OpenCV,ffmpeg,QTKit,turbojpeg) do multithreading. From my understanding when calling into these libs the GIL is released and CPU heavy computation are run on multiple cores.

There are still limits to mulit-core cpu use with Pupil because of the GIL but in the situations where Pupil did not run fast enough for me it was not because of limits in threading but rather overall CPU power (on recent CPU's with turbo boost, the TPD envelope is the limiting factor and not a lack of multithreading.)

About performance in general:

I do extensive profiling on Pupil Capture and my findings are that in world.py we spend 95% in library execution (already optimised) and only 5% in Python code. (most time is spend decompressing camera images, compressing them using ffmpeg and/or pushing them to the GPU, all of this happens outside of Python.)

eye.py's pupil detector is not as nice in this regard and could be optimised by cythonizing parts.

If you worry about performance for parts of Pupil plugins that are written in Python only, you should rather consider adding some cython it will give you ~10-100x speedup.

I hope this was informative!

M

On Fri, Feb 13, 2015 at 8:19 AM, Rafael Picanço notifications@github.com wrote:

Hey Will, I have been using Pupil with an i3 intel that have 4 virtual cores (two phisical cores i think). I could provide you more detailed information if you wish. I am aware of the multithread implementation just described. I am not sure what I should expect in the multi core level though. On Feb 13, 2015 4:23 AM, "Will" notifications@github.com wrote:

Pupil Capture uses separate processes for eye and world. The first few lines of capture main.py < https://github.com/pupil-labs/pupil/blob/master/pupil_src/capture/main.py#L14-L21

are where the multiprocessing module or billiard (if the system is MacOS) are imported. This allows eye and world to run independently (in parallel) at their own rates. Communication between processes and more explanation on the wiki < https://github.com/pupil-labs/pupil/wiki/Development-Overview#inter-process-communication

.

Pupil Player makes use of processes for exporting visualizations. See export launcher < https://github.com/pupil-labs/pupil/blob/master/pupil_src/player/export_launcher.py

and batch exporter < https://github.com/pupil-labs/pupil/blob/master/pupil_src/player/batch_exporter.py

plugins. This enables one can continue interacting with player and create new visualizations (in parallel) with writing visualizations.

Take a look at your system monitor (or Activity Monitor on MacOS) to see how your CPU is utilized and changes when using Pupil Capture and Pupil Player.

Reply to this email directly or view it on GitHub https://github.com/pupil-labs/pupil/issues/52#issuecomment-74210578.

— Reply to this email directly or view it on GitHub https://github.com/pupil-labs/pupil/issues/52#issuecomment-74214583.

cpicanco commented 9 years ago

Thank you Moritz and Will! Indeed, very informative.

So, for Pupil Capture Plugins, I think Cython is kind of "mandatory" if you want real time detection/interaction. Since now I am working on Player, this is not so urgent. But I am curious about what could I do with the capture in a near future, so better keep in mind your tips! For now, I think is a good idea learn how to do this "profiling". I could share the results with you. :)

willpatera commented 9 years ago

Hi @cpicanco,

A profiler is already built into Pupil Capture that generates performance graphs. In order to run the profiler, you need to install graphviz. On MacOS, this can be done with brew install graphviz. On Ubuntu apt-get install graphviz.

Once installed you can just run Pupil Capture and supply any additional argument after main.py. For example python main.py foo. After you exit Pupil Capture, a performance graph will be generated for eye and world as separate .png files in the pupil_src folder. Attached are performance cpu timing graphs for eye and world (using v0.4.0 branch). The graphs are big and will probably not show up well in the preview, so download or open images in a new tab to read details.

world_cpu_time _world_cputime.png eye_cpu_time _eye_cputime.png

There are certainly areas that could be improved/optimized. While performance is critical, it is also important to make optimizations in such a way that the code remains readable and maintainable (accessible) so that others can still dive into the source and make changes :)

_w

cpicanco commented 9 years ago

Thank you Will. I will check this out. Well, readable code is important. However, I need to confess that I am not a professional programmer, just an enthusiastic one. So any feedback about the work I have done until now would be very appreciated. I am planning to ask some other questions on CodeReview too. :)

Long life to the Pupil Team!

Best.

Rafael


Edit: Just for reference, this is my processor: http://ark.intel.com/products/49020/Intel-Core-i3-370M-Processor-3M-cache-2_40-GHz

cpicanco commented 9 years ago

Just a quick observation about how informative is this profiling method. This profile does not document overall performance. It gives details about the application's reserved processing. Let me explain and please, correct me if I am wrong on something. For example, running Pupil Player on my machine takes 25% of overall (total) processor power. Maybe Pupil Player is running on a single logical core determined by the OS. Making such a profile would give me a picture of this 25%. But what about the other 75%? How to know for sure if the 25% cap was done by either Pupil or the OS?

Further, regarding this local profile, I think one would be better served if some constraints were given based on the application states (idle, running, capturing, and so on). To avoid disturbing means.

mkassner commented 9 years ago

Hi,

all profile does is give the distribution of cpu time spend across all called functions. Regarding the application state you are correct. I recommend letting the process run in the desired state 20-100 times longer that any other state to 'overweight' the undesired measurements.

Looking at the distribution of CPU time across the pupil player process and the other processes running, you should use the OS activity monitor.

If you want to know cpu load of the pupil process have a look at the new v0.4 version branch and its performance graphs. We now monitor CPU load of world, eye and player and display it in the window.

We will release the bundle soon and need to update the dependencies guide accordingly but if you feel adventurous please have a look yourself already now.

M

cpicanco commented 9 years ago

Lets install the v0.4. On Feb 15, 2015 11:05 AM, "mkassner" notifications@github.com wrote:

Hi,

all profile does is give the distribution of cpu time spend across all called functions. Regarding the application state you are correct. I recommend letting the process run in the desired state 20-100 times longer that any other state to 'overweight' the undesired measurements.

Looking at the distribution of CPU time across the pupil player process and the other processes running, you should use the OS activity monitor.

If you want to know cpu load of the pupil process have a look at the new v0.4 version branch and its performance graphs. We now monitor CPU load of world, eye and player and display it in the window.

We will release the bundle soon and need to update the dependencies guide accordingly but if you feel adventurous please have a look yourself already now.

M

Reply to this email directly or view it on GitHub https://github.com/pupil-labs/pupil/issues/52#issuecomment-74416643.

cpicanco commented 9 years ago

Well, is not an easy task "take full advantage of multi core processors", and multi-platform scenario increases complexity even more. For reference: http://stackoverflow.com/questions/663958/how-to-control-which-core-a-process-runs-on http://www.ibm.com/developerworks/linux/library/l-affinity/index.html

More: http://xmodulo.com/run-program-process-specific-cpu-cores-linux.html

cpicanco commented 9 years ago

The issue about the CPU graph gives evidence about multi core usage. #69

For example, when marker detection is on, overall cpu consumption (system process + Pupil process) raises from a range of 30 - 35 % to 85 to 89 %.