Open MMarkovetz opened 5 years ago
I would focus on the fact that it fails with import pims
only. PIMS is pure-python, but its dependencies consist of lots of C libraries, etc. Perhaps one of those packages was updated automatically to maintain compatibility. You can look at a (mostly complete) list of possible culprits in the PIMS installation instructions. Let us know what you find!
Following up on this: there is a new parallel batch()
feature in trackpy v0.4.2. Pull request https://github.com/soft-matter/trackpy/pull/606 gets it to work better with pims. Would that by any chance meet your needs?
Also, the fact that pims typically imports matplotlib might be wreaking havoc with your worker processes, depending on how matplotlib is configured.
Just to add to @nkeim's comment: I have found that depending on the backend you use with Matplotlib, you run into issues when you use multiprocessing at the same time. The issue occurs for example when you use the graphical backend TkAgg (e.g. matplotlib.use('TkAgg')
) but is resolved when you switch to a non-graphical backend for the multiprocessing part (for example matplotlib.use('Agg')
).
Could you maybe try your minimal example again, while using the Agg backend?
Hey guys,
Thank you for looking into this and for the suggestions. I will definitely look into both options. Unfortunately, I do not have any of the videos from my lab here at home with me at the moment, but I'll get a few over the weekend and let you know what I find.
For the moment, I transitioned our lab to a Jupyter Notebook instead of a GUI. The good news of that is that using the newer implementations of batch() that have built-in multiprocessing there have sped up our tracking by about 30%. So thank you for continuing your work on this project! It is a great help and really exciting to see it get better over time.
Best, Matt
On Fri, May 1, 2020 at 2:29 AM Ruben Verweij notifications@github.com wrote:
Just to add to @nkeim https://github.com/nkeim's comment: I have found that depending on the backend you use with Matplotlib, you run into issues when you use multiprocessing at the same time. The issue occurs for example when you use the graphical backend TkAgg (e.g. matplotlib.use('TkAgg')) but is resolved when you switch to a non-graphical backend for the multiprocessing part (for example matplotlib.use('Agg')). Could you maybe try your minimal example again, while using the Agg backend?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/soft-matter/pims/issues/329#issuecomment-622268611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCU3AGJ3HAN3NTXU2IUXB3RPJT4HANCNFSM4IUDLRWA .
-- Matthew Markovetz, Ph.D. Postdoctoral Researcher Cystic Fibrosis and Pulmonary Disease Research Center The University of North Carolina at Chapel Hill 7119 Marsico Hall 125 Mason Farms Drive Chapel Hill, NC, 27599
Hey guys,
I had some time to look into this. Using that same code with all packages updated I get a timeout error when importing pandas or pims. It does not seem like matplotlib is the primary culprit this case. I imagine the pandas issue may also be closely related to the pims issue, but I'm not sure how true that is.
Best, Matt
On Fri, May 1, 2020 at 8:31 AM Matthew Markovetz matthewmarkovetz@gmail.com wrote:
Hey guys,
Thank you for looking into this and for the suggestions. I will definitely look into both options. Unfortunately, I do not have any of the videos from my lab here at home with me at the moment, but I'll get a few over the weekend and let you know what I find.
For the moment, I transitioned our lab to a Jupyter Notebook instead of a GUI. The good news of that is that using the newer implementations of batch() that have built-in multiprocessing there have sped up our tracking by about 30%. So thank you for continuing your work on this project! It is a great help and really exciting to see it get better over time.
Best, Matt
On Fri, May 1, 2020 at 2:29 AM Ruben Verweij notifications@github.com wrote:
Just to add to @nkeim https://github.com/nkeim's comment: I have found that depending on the backend you use with Matplotlib, you run into issues when you use multiprocessing at the same time. The issue occurs for example when you use the graphical backend TkAgg (e.g. matplotlib.use('TkAgg')) but is resolved when you switch to a non-graphical backend for the multiprocessing part (for example matplotlib.use('Agg')). Could you maybe try your minimal example again, while using the Agg backend?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/soft-matter/pims/issues/329#issuecomment-622268611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCU3AGJ3HAN3NTXU2IUXB3RPJT4HANCNFSM4IUDLRWA .
-- Matthew Markovetz, Ph.D. Postdoctoral Researcher Cystic Fibrosis and Pulmonary Disease Research Center The University of North Carolina at Chapel Hill 7119 Marsico Hall 125 Mason Farms Drive Chapel Hill, NC, 27599
-- Matthew Markovetz, Ph.D. Postdoctoral Researcher Cystic Fibrosis and Pulmonary Disease Research Center The University of North Carolina at Chapel Hill 7119 Marsico Hall 125 Mason Farms Drive Chapel Hill, NC, 27599
Thanks for the follow-up! It sounds like you have ruled out matplotlib, which pandas and pims have in common.
Have you tried using the plain old multiprocessing
module instead of ipyparallel? This is not as sophisticated but is more widely used. It seems to work well for the parallel processing functionality that is in the development version of trackpy.batch()
.
In addition, we found that parallel processing on Windows also caused some issues in a different package (see https://github.com/soft-matter/trackpy/issues/610)
According to this question (https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing) the problem could be this:
On Windows the subprocesses will import (i.e. execute) the main module at start. You need to insert an
if __name__ == '__main__'
: guard in the main module to avoid creating subprocesses recursively.
@MMarkovetz in your example you have this main guard, so I'm not sure if it is at all related, but I just wanted to mention that this might cause issues.
Until recently, I have been able to use variations of the parallel example for particle localization successfully. However, I recently updated both Trackpy and pims hoping to improve performance and ease of use for others in my lab and it seems to have broken parallelization of any form, not just ipyparallel.
With my tracking code
I can now only process one video, and then the memory overflows. I used to be able to loop through a list of videos to process automatically, but this issue obviously prevents that. After a lot of digging, I have what I think is a pretty minimal case that shows that both Trackpy and pims, but not other packages, aren't playing well with any sort of parallelization on my machines (Windows 7 or Server 2012, 8 or 32 cpus, Python 3.7).
This code will return a timeout error if I import either trackpy or pims or both, but all of the other packages are fine. Any ideas on why this might be?
Matt