soft-matter / trackpy

Python particle tracking toolkit
http://soft-matter.github.io/trackpy
Other
445 stars 131 forks source link

Parallel processing of frames with batch stalls on Windows #610

Closed Jdogzz closed 1 year ago

Jdogzz commented 4 years ago

I've been having an issue with the parallel processing feature of the batch command on Windows 10. My script is as follows, extremely simple:

import trackpy as tp
import pims
frames = pims.open('D:/images/*.png')
f = tp.batch(frames, 41, processes='auto',engine='numba')

This results in a series of error messages that look like this:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\testrun.py", line 4, in <module>
    f = tp.batch(frames, 41, processes='auto',engine='numba')
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\site-packages\trackpy\feature.py", line 552, in batch
    pool, map_func = _get_pool(processes)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\site-packages\trackpy\utils.py", line 488, in _get_pool
    pool = Pool(processes=processes)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\pool.py", line 176, in __init__
    self._repopulate_pool()
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
    w.start()
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\myusername\Anaconda3\envs\testtrackpy\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

By contrast, when running the same script on a Mac, with only the path altered to something like '/Users/myuser/images/*.png', I get the expected ability to process frames in parallel.

For each operating system, this has been tried in a fresh conda environment, Python 3.7, with git, pip, pims, and imageio all installed through conda (along with their dependencies), and installing the current github copy of trackpy through pip. I'm not seeing any cautionary notes about parallel processing specific to Windows in the trackpy documentation, but if there are please point me to the relevant page.

I'm attaching my test dataset in case it is relevant, a series of png images with a white circle on a black background. images.tar.gz

nkeim commented 4 years ago

Thanks for this helpful report. What version of trackpy are you using? And are you running it from the command line?

In the development version on GitHub, parallel batch is tested on Windows, so this bug may be a little complicated.

Jdogzz commented 4 years ago

Both systems report trackpy version 0.4.2+29.g1f720ff. I run the above script from the command line in each case (on Windows, I have previously tried running it in Spyder but no errors are printed to the terminal when doing that, it just hangs. From a command prompt launched from Anaconda Navigator, I can see the above error being printed out repeatedly).

nkeim commented 4 years ago

Thanks. From the documentation, it looks like freeze_support() is only needed when your code is set up to run as a standalone .exe file (i.e. not within python.exe as usual). So it's unclear why you would get that error if you're just running from the command line.

Nonetheless, would you mind trying to add

if __name__ == '__main__':
    freeze_support()

to the top level of your script, right before the code that invokes batch?

Jdogzz commented 4 years ago

Simply copy pasting that in before the batch line (with the appropriate import) didn't work, but rewriting with a main function led to the expected behavior in Spyder and directly from the command line:

import trackpy as tp
import pims
from multiprocessing import freeze_support

def main(): 
    frames = pims.open('D:/images/*.png')  
    f = tp.batch(frames, 41, processes='auto',engine='numba')

if __name__ == '__main__':
    freeze_support()
    main()

As you mentioned it is quite strange the interpreter needed this to work since the documentation said it did not, and is of course more complex than just the 4 lines needed on the Mac.

nkeim commented 4 years ago

Great! Thanks for providing a working example. I'd like to keep this issue open until we can add a mention of this to the batch docstring. (Or feel free to contribute text!)

rbnvrw commented 4 years ago

It is very strange indeed. Especially since we get no errors while testing on Travis. From the documentation (https://docs.python.org/3/library/multiprocessing.html#multiprocessing.freeze_support) it seems like it should not be necessary:

Calling freeze_support() has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then freeze_support() has no effect.

According to this question (https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing) the problem is this:

On Windows the subprocesses will import (i.e. execute) the main module at start. You need to insert an if __name__ == '__main__': guard in the main module to avoid creating subprocesses recursively.

So I think the freeze_support is not needed but the guard is. @Jdogzz could you test your script without the freeze_support to confirm whether this is the issue? Then we can add the proper advice to the docstring. Thanks!

Jdogzz commented 4 years ago

I can confirm that removing only the freeze_support line still allows the script to run (tried on the Windows machine, both from the command line and within spyder).

nkeim commented 4 years ago

That makes sense and I imagine many people run trackpy from a simple script. So now I'm worried about us having merged #606 to use multiprocessing by default — on Windows it could break a lot of existing code.

One way to resolve this is that trackpy could catch a RuntimeError when it invokes multiprocessing, and if the platform is Windows, fall back to a single process and issue a warning. That seems like a sensible compromise.

nkeim commented 4 years ago

Also, thanks @rbnvrw for the detective work! I'm surprised that detail is not in the standard library docs.

b-grimaud commented 3 years ago

Is it possible to disable multithreading in such a case ? I encounter the same error, and unfortunately using the if __name__ == '__main__' guard does not seem to work. I use tp.batch() in a function, and placing the guard both above the line which invokes the function and within the function itself just skips ahead, and does not execute anything.

EDIT : I encountered this problem using trackpy 0.5.0, I found a workaround by reverting to 0.4.2

nkeim commented 3 years ago

Have you tried the processes=1 argument?

Any more information you can give, especially about your Python, Windows, and trackpy versions, would be helpful.

b-grimaud commented 3 years ago

Have you tried the processes=1 argument?

This does solve the problem, I was looking at 0.4.2 documentation and did not notice that argument. Thank you very much ! For reference, I am using Windows 10, Python 3.7.4, and (now) TrackPy 0.5.0

b-grimaud commented 3 years ago

Small update : placing a if __name__ == '__main__' guard at the very top of my script does work with processes='auto' using trackpy 0.5.0 and Windows 10. However, doing so prints a dozen 0.0 right as tp.batch is called. For what it's worth, I have a tp.quiet([True]) statement right above. I have tried with a few different .tif files, and so far I consistently get those 12 0.0 prints whenever I use processes='auto'.

nkeim commented 3 years ago

Interesting! Do you have an Intel CPU with 6 cores?

b-grimaud commented 3 years ago

I do ! I use an Intel i7-10810U.

nkeim commented 3 years ago

OK. Let's see if we can isolate this to trackpy. Can you comment out batch() in your script and insert something like

from multiprocessing import Pool
with Pool() as pool:
    pool.map(round, list(range(100)))

If starting up the multiprocessing pool gets you the same unwanted output, then we at least know that the feature-finding code isn't responsible. The next step would be a process of elimination to check whether you can stop the unwanted output by removing import trackpy or some other module.

b-grimaud commented 3 years ago

I ran that code a few times, and I get anywhere between 1 to 3 0.0, seemingly at random.

nkeim commented 3 years ago

🤪 That was not expected. Maybe I don't understand a detail of Pool (this is what is called by batch() for multiprocessing).

In any case, I'm betting that one of your import statements — maybe trackpy but probably something else — is the cause.

b-grimaud commented 3 years ago

Could homemade modules be at fault here ? I don't import much otherwise, at least in that specific part of the script, only trackpy, numpy, os and imageio.

nkeim commented 3 years ago

That seems likely. It could still be an unknown problem with trackpy but we haven't had any other reports.

b-grimaud commented 3 years ago

I've tried running the script with only os, imageio and trackpy itself, and I still get the same output.

nkeim commented 3 years ago

Awesome!

Since you were able to see (a version of) the problem without calling trackpy.batch() at all, does that mean you can reproduce it if there is no import imageio and no data loaded? It would be great to get to a minimal example that can reproduce the behavior on other computers.

b-grimaud commented 3 years ago

I did call tp.batch while running the script with minimal imports. I also tried calling tp.batch with no data and no import imageio, and it failed, as expected, because there was no data to process. I investigated further by loading a file with imageio, dumping it into a numpy save file (.npy), then running tp.batch on a script with no import imageio but import numpy instead, and that did the trick ! I don't get any 0.0, just the regular trackpy.feature.batch output. Seems like imageio is the issue then, I might finally take the time to switch to pims.

nkeim commented 3 years ago

That's great! I'm afraid imageio is an optional dependency of pims, so you might still have to remove it or (ideally) switch to a different version.