Analysis with streaming

nklongvessa commented 7 years ago

Hello everyone,

I've been using trackpy for about 5-6 months. It helps me a lot for my research. Last few weeks I came across with the problem of big data, but thanks to you guys that you also provide the streaming options for tracking and linking trajectories. However, for the next step such as filter trajectories and further analysis (MSD, for example), I will need to write my own script files to do these jobs because most of the trackpy functions require the whole result files as an input.

In my case, I have 4,000 images and each one contains about 20,000 particles with 11 pixels in diameter. I store my results with tp.PandasHDFStoreSingleNode.

Are you interested in extending the streaming options? My supervisor, @MathieuLeocmach, also help me to get involved with it.

MathieuLeocmach commented 7 years ago

Hello,

One question we have is where we should put these new functions.

For example, if we ha a function called filter_stubs_fromStream should it be in filtering.py or in framewise_data.py or in a completely new file?

Same question for msd_fromStream.

nkeim commented 7 years ago

@nklongvessa This is very exciting! I think that the other maintainers would all be very interested in these capabilities. We want to help people analyze experiments that were previously difficult, slow, or impossible. Right now trackpy works well to find and store the trajectories in large data sets (my own experiments have about 12k frames and 30k particles), but as you say, it cannot then perform the very important step of computing MSD.

My own inclination is to put each streaming function in the same file as its non-streaming counterpart. My least-preferred option is to put it in framewise_data.py; I think that module should be dedicated to data storage only.

I don't know if this is part of your plans, but once an algorithm has been made suitable for streaming it may be much easier to run on many cores in parallel. There has been a lot of interest in using dask to convert the entire trackpy "stack" to be as parallel as possible, beginning with reading images from the disk (see https://github.com/soft-matter/pims/pull/254 ), and ending with trajectory analysis such as your functions.

I am really looking forward to this contribution! Let us know if you have more questions.

soft-matter / trackpy

Analysis with streaming #437