soft-matter / pims

Python Image Sequence: Load video and sequential images in many formats with a simple, consistent interface.
http://soft-matter.github.io/pims/
Other
262 stars 67 forks source link

Multiprocessing fails when importing Trackpy and/or pims #329

Open MMarkovetz opened 5 years ago

MMarkovetz commented 5 years ago

Until recently, I have been able to use variations of the parallel example for particle localization successfully. However, I recently updated both Trackpy and pims hoping to improve performance and ease of use for others in my lab and it seems to have broken parallelization of any form, not just ipyparallel.

With my tracking code

view.map(curr_loc,frames[:8]) # prime each engine
amr = view.map_async(curr_loc, frames) # map the whole parallel process
amr.wait_interactive() # report the progress of moving through the map
results = amr.get()

I can now only process one video, and then the memory overflows. I used to be able to loop through a list of videos to process automatically, but this issue obviously prevents that. After a lot of digging, I have what I think is a pretty minimal case that shows that both Trackpy and pims, but not other packages, aren't playing well with any sort of parallelization on my machines (Windows 7 or Server 2012, 8 or 32 cpus, Python 3.7).

from __future__ import division, unicode_literals, print_function  # for compatibility with Python 2 and 3

import multiprocessing as mp
from multiprocessing import Pool

#import matplotlib as mpl
#import matplotlib.pyplot as plt
#import numpy as np
#import pandas as pd
#import scipy.io as sio

#import slicerator
#import pims
#import trackpy as tp

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    result = pool.apply_async(f, [10])    # evaluate "f(10)" asynchronously
    print(result.get(timeout=1))           # prints "100" unless your computer is *very* slow
    print(pool.map(f, range(10)))

This code will return a timeout error if I import either trackpy or pims or both, but all of the other packages are fine. Any ideas on why this might be?

Matt

nkeim commented 5 years ago

I would focus on the fact that it fails with import pims only. PIMS is pure-python, but its dependencies consist of lots of C libraries, etc. Perhaps one of those packages was updated automatically to maintain compatibility. You can look at a (mostly complete) list of possible culprits in the PIMS installation instructions. Let us know what you find!

nkeim commented 4 years ago

Following up on this: there is a new parallel batch() feature in trackpy v0.4.2. Pull request https://github.com/soft-matter/trackpy/pull/606 gets it to work better with pims. Would that by any chance meet your needs?

Also, the fact that pims typically imports matplotlib might be wreaking havoc with your worker processes, depending on how matplotlib is configured.

rbnvrw commented 4 years ago

Just to add to @nkeim's comment: I have found that depending on the backend you use with Matplotlib, you run into issues when you use multiprocessing at the same time. The issue occurs for example when you use the graphical backend TkAgg (e.g. matplotlib.use('TkAgg')) but is resolved when you switch to a non-graphical backend for the multiprocessing part (for example matplotlib.use('Agg')). Could you maybe try your minimal example again, while using the Agg backend?

MMarkovetz commented 4 years ago

Hey guys,

Thank you for looking into this and for the suggestions. I will definitely look into both options. Unfortunately, I do not have any of the videos from my lab here at home with me at the moment, but I'll get a few over the weekend and let you know what I find.

For the moment, I transitioned our lab to a Jupyter Notebook instead of a GUI. The good news of that is that using the newer implementations of batch() that have built-in multiprocessing there have sped up our tracking by about 30%. So thank you for continuing your work on this project! It is a great help and really exciting to see it get better over time.

Best, Matt

On Fri, May 1, 2020 at 2:29 AM Ruben Verweij notifications@github.com wrote:

Just to add to @nkeim https://github.com/nkeim's comment: I have found that depending on the backend you use with Matplotlib, you run into issues when you use multiprocessing at the same time. The issue occurs for example when you use the graphical backend TkAgg (e.g. matplotlib.use('TkAgg')) but is resolved when you switch to a non-graphical backend for the multiprocessing part (for example matplotlib.use('Agg')). Could you maybe try your minimal example again, while using the Agg backend?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/soft-matter/pims/issues/329#issuecomment-622268611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCU3AGJ3HAN3NTXU2IUXB3RPJT4HANCNFSM4IUDLRWA .

-- Matthew Markovetz, Ph.D. Postdoctoral Researcher Cystic Fibrosis and Pulmonary Disease Research Center The University of North Carolina at Chapel Hill 7119 Marsico Hall 125 Mason Farms Drive Chapel Hill, NC, 27599

MMarkovetz commented 4 years ago

Hey guys,

I had some time to look into this. Using that same code with all packages updated I get a timeout error when importing pandas or pims. It does not seem like matplotlib is the primary culprit this case. I imagine the pandas issue may also be closely related to the pims issue, but I'm not sure how true that is.

Best, Matt

On Fri, May 1, 2020 at 8:31 AM Matthew Markovetz matthewmarkovetz@gmail.com wrote:

Hey guys,

Thank you for looking into this and for the suggestions. I will definitely look into both options. Unfortunately, I do not have any of the videos from my lab here at home with me at the moment, but I'll get a few over the weekend and let you know what I find.

For the moment, I transitioned our lab to a Jupyter Notebook instead of a GUI. The good news of that is that using the newer implementations of batch() that have built-in multiprocessing there have sped up our tracking by about 30%. So thank you for continuing your work on this project! It is a great help and really exciting to see it get better over time.

Best, Matt

On Fri, May 1, 2020 at 2:29 AM Ruben Verweij notifications@github.com wrote:

Just to add to @nkeim https://github.com/nkeim's comment: I have found that depending on the backend you use with Matplotlib, you run into issues when you use multiprocessing at the same time. The issue occurs for example when you use the graphical backend TkAgg (e.g. matplotlib.use('TkAgg')) but is resolved when you switch to a non-graphical backend for the multiprocessing part (for example matplotlib.use('Agg')). Could you maybe try your minimal example again, while using the Agg backend?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/soft-matter/pims/issues/329#issuecomment-622268611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCU3AGJ3HAN3NTXU2IUXB3RPJT4HANCNFSM4IUDLRWA .

-- Matthew Markovetz, Ph.D. Postdoctoral Researcher Cystic Fibrosis and Pulmonary Disease Research Center The University of North Carolina at Chapel Hill 7119 Marsico Hall 125 Mason Farms Drive Chapel Hill, NC, 27599

-- Matthew Markovetz, Ph.D. Postdoctoral Researcher Cystic Fibrosis and Pulmonary Disease Research Center The University of North Carolina at Chapel Hill 7119 Marsico Hall 125 Mason Farms Drive Chapel Hill, NC, 27599

nkeim commented 4 years ago

Thanks for the follow-up! It sounds like you have ruled out matplotlib, which pandas and pims have in common.

Have you tried using the plain old multiprocessing module instead of ipyparallel? This is not as sophisticated but is more widely used. It seems to work well for the parallel processing functionality that is in the development version of trackpy.batch().

rbnvrw commented 4 years ago

In addition, we found that parallel processing on Windows also caused some issues in a different package (see https://github.com/soft-matter/trackpy/issues/610)

According to this question (https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing) the problem could be this:

On Windows the subprocesses will import (i.e. execute) the main module at start. You need to insert an if __name__ == '__main__': guard in the main module to avoid creating subprocesses recursively.

@MMarkovetz in your example you have this main guard, so I'm not sure if it is at all related, but I just wanted to mention that this might cause issues.