Closed Jdogzz closed 1 year ago
Thanks for this helpful report. What version of trackpy are you using? And are you running it from the command line?
In the development version on GitHub, parallel batch is tested on Windows, so this bug may be a little complicated.
Both systems report trackpy version 0.4.2+29.g1f720ff. I run the above script from the command line in each case (on Windows, I have previously tried running it in Spyder but no errors are printed to the terminal when doing that, it just hangs. From a command prompt launched from Anaconda Navigator, I can see the above error being printed out repeatedly).
Thanks. From the documentation, it looks like freeze_support()
is only needed when your code is set up to run as a standalone .exe
file (i.e. not within python.exe
as usual). So it's unclear why you would get that error if you're just running from the command line.
Nonetheless, would you mind trying to add
if __name__ == '__main__':
freeze_support()
to the top level of your script, right before the code that invokes batch
?
Simply copy pasting that in before the batch line (with the appropriate import) didn't work, but rewriting with a main function led to the expected behavior in Spyder and directly from the command line:
import trackpy as tp
import pims
from multiprocessing import freeze_support
def main():
frames = pims.open('D:/images/*.png')
f = tp.batch(frames, 41, processes='auto',engine='numba')
if __name__ == '__main__':
freeze_support()
main()
As you mentioned it is quite strange the interpreter needed this to work since the documentation said it did not, and is of course more complex than just the 4 lines needed on the Mac.
Great! Thanks for providing a working example. I'd like to keep this issue open until we can add a mention of this to the batch
docstring. (Or feel free to contribute text!)
It is very strange indeed. Especially since we get no errors while testing on Travis. From the documentation (https://docs.python.org/3/library/multiprocessing.html#multiprocessing.freeze_support) it seems like it should not be necessary:
Calling freeze_support() has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then freeze_support() has no effect.
According to this question (https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing) the problem is this:
On Windows the subprocesses will import (i.e. execute) the main module at start. You need to insert an
if __name__ == '__main__'
: guard in the main module to avoid creating subprocesses recursively.
So I think the freeze_support
is not needed but the guard is. @Jdogzz could you test your script without the freeze_support
to confirm whether this is the issue? Then we can add the proper advice to the docstring. Thanks!
I can confirm that removing only the freeze_support line still allows the script to run (tried on the Windows machine, both from the command line and within spyder).
That makes sense and I imagine many people run trackpy from a simple script. So now I'm worried about us having merged #606 to use multiprocessing by default — on Windows it could break a lot of existing code.
One way to resolve this is that trackpy could catch a RuntimeError
when it invokes multiprocessing
, and if the platform is Windows, fall back to a single process and issue a warning. That seems like a sensible compromise.
Also, thanks @rbnvrw for the detective work! I'm surprised that detail is not in the standard library docs.
Is it possible to disable multithreading in such a case ?
I encounter the same error, and unfortunately using the if __name__ == '__main__'
guard does not seem to work.
I use tp.batch() in a function, and placing the guard both above the line which invokes the function and within the function itself just skips ahead, and does not execute anything.
EDIT : I encountered this problem using trackpy 0.5.0, I found a workaround by reverting to 0.4.2
Have you tried the processes=1 argument?
Any more information you can give, especially about your Python, Windows, and trackpy versions, would be helpful.
Have you tried the processes=1 argument?
This does solve the problem, I was looking at 0.4.2 documentation and did not notice that argument. Thank you very much ! For reference, I am using Windows 10, Python 3.7.4, and (now) TrackPy 0.5.0
Small update : placing a if __name__ == '__main__'
guard at the very top of my script does work with processes='auto'
using trackpy 0.5.0 and Windows 10.
However, doing so prints a dozen 0.0
right as tp.batch
is called. For what it's worth, I have a tp.quiet([True])
statement right above.
I have tried with a few different .tif files, and so far I consistently get those 12 0.0
prints whenever I use processes='auto'
.
Interesting! Do you have an Intel CPU with 6 cores?
I do ! I use an Intel i7-10810U.
OK. Let's see if we can isolate this to trackpy
. Can you comment out batch()
in your script and insert something like
from multiprocessing import Pool
with Pool() as pool:
pool.map(round, list(range(100)))
If starting up the multiprocessing pool gets you the same unwanted output, then we at least know that the feature-finding code isn't responsible. The next step would be a process of elimination to check whether you can stop the unwanted output by removing import trackpy
or some other module.
I ran that code a few times, and I get anywhere between 1 to 3 0.0
, seemingly at random.
🤪 That was not expected. Maybe I don't understand a detail of Pool
(this is what is called by batch()
for multiprocessing).
In any case, I'm betting that one of your import
statements — maybe trackpy but probably something else — is the cause.
Could homemade modules be at fault here ? I don't import much otherwise, at least in that specific part of the script, only trackpy
, numpy
, os
and imageio
.
That seems likely. It could still be an unknown problem with trackpy but we haven't had any other reports.
I've tried running the script with only os
, imageio
and trackpy
itself, and I still get the same output.
Awesome!
Since you were able to see (a version of) the problem without calling trackpy.batch()
at all, does that mean you can reproduce it if there is no import imageio
and no data loaded? It would be great to get to a minimal example that can reproduce the behavior on other computers.
I did call tp.batch
while running the script with minimal imports.
I also tried calling tp.batch
with no data and no import imageio
, and it failed, as expected, because there was no data to process.
I investigated further by loading a file with imageio
, dumping it into a numpy
save file (.npy
), then running tp.batch
on a script with no import imageio
but import numpy
instead, and that did the trick !
I don't get any 0.0
, just the regular trackpy.feature.batch
output.
Seems like imageio
is the issue then, I might finally take the time to switch to pims
.
That's great! I'm afraid imageio is an optional dependency of pims, so you might still have to remove it or (ideally) switch to a different version.
I've been having an issue with the parallel processing feature of the batch command on Windows 10. My script is as follows, extremely simple:
This results in a series of error messages that look like this:
By contrast, when running the same script on a Mac, with only the path altered to something like '/Users/myuser/images/*.png', I get the expected ability to process frames in parallel.
For each operating system, this has been tried in a fresh conda environment, Python 3.7, with git, pip, pims, and imageio all installed through conda (along with their dependencies), and installing the current github copy of trackpy through pip. I'm not seeing any cautionary notes about parallel processing specific to Windows in the trackpy documentation, but if there are please point me to the relevant page.
I'm attaching my test dataset in case it is relevant, a series of png images with a white circle on a black background. images.tar.gz