labscript-suite / lyse

𝗹𝘆𝘀𝗲 is a data analysis framework for experiments controlled by the 𝘭𝘢𝘣𝘴𝘤𝘳𝘪𝘱𝘵 𝘴𝘶𝘪𝘵𝘦. It coordinates online analysis of live experiment data, by automatically running Python analysis scripts.
http://labscriptsuite.org
Other
3 stars 45 forks source link

Unable to fork a process from a lyse script with multiprocessing #48

Closed philipstarkey closed 5 years ago

philipstarkey commented 5 years ago

Original report (archived issue) by Lincoln Turner (Bitbucket: lincolnturner_monash, GitHub: lincolnturner).


I am unable to fork a a process from a lyse script with the multiprocessing module and encounter a PicklingError. The same script runs without error in a console outside of lyse.

The following code in script.py:

import multiprocessing
import lyse

def farm():
print('Print something')
if __name__ == __'main'__:
sml= multiprocessing.Process(target=farm)
sml.start()

Returns the following error:

Traceback (most recent call last):

File "script.py", line 8, in <module>

sml.start()

File "C:\Program Files\Anaconda3\envs\python37\lib\multiprocessing\process.py", line 112, in start

self._popen = self._Popen(self)

File "C:\Program Files\Anaconda3\envs\python37\lib\multiprocessing\context.py", line 223, in _Popen

return _default_context.get_context().Process._Popen(process_obj)

File "C:\Program Files\Anaconda3\envs\python37\lib\multiprocessing\context.py", line 322, in _Popen

return Popen(process_obj)

File "C:\Program Files\Anaconda3\envs\python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__

reduction.dump(process_obj, to_child)

File "C:\Program Files\Anaconda3\envs\python37\lib\multiprocessing\reduction.py", line 60, in dump

ForkingPickler(file, protocol).dump(obj)

_pickle.PicklingError: Can't pickle <function farm at 0x0000000009C989D8>: attribute lookup farm on __main__ failed

Traceback (most recent call last):

File "<string>", line 1, in <module>

File "C:\Program Files\Anaconda3\envs\python37\lib\multiprocessing\spawn.py", line 105, in spawn_main

exitcode = _main(fd)

File "C:\Program Files\Anaconda3\envs\python37\lib\multiprocessing\spawn.py", line 115, in _main

self = reduction.pickle.load(from_parent)

EOFError: Ran out of input

philipstarkey commented 5 years ago

Original comment by Philip Starkey (Bitbucket: pstarkey, GitHub: philipstarkey).


(bleh…bitbucket locked up my browser last time I wrote this out (and so I lost the whole post) so apologies for the brevity….)

Shaun and I have worked around this before by creating a wrapper lyse script which launches the original script (that uses multiprocessing) with subprocess.Popen() and passes in the lyse shot file path via a command line argument, and extracts results either through stdout or the h5 file (for large datasets). @chrisjbillington will know if there is an easy way to fix it properly (but I suspect not because multiprocessing libraries are inherently messy in how they fork processes, which is why we wrote zprocess).

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Ah, tricky.

The problem is that lyse does not actually run the analysis script as the __main__ module. It adds a global variable __name__ = '__main__' so that analysis code thinks it's in the __main__ module, but if you look up sys.modules['__main__'], you get analysis_subprocess.py, which of course is the actual initial script being run by Python.

And in Windows since multiprocessing is not a fork at all, it expects to be able to run the file corresponding to the __main__ module in the subprocess and that it will find farm() there. Of course farm is not in analysis_subprocess.py. (Also analysis_subprocess.py hangs at startup trying to connect to lyse, but that's another issue).

So I can fix lyse to be more convincing about running the analysis routine as the __main__ module, so that multiprocessing works. I'll have a look at this, I'm sure it can be done!

But in the meantime, you should be able to make it work by defining farm() in a separate module and importing it into your analysis script. That way the multiprocessing module will be able to find it in the subprocess.

Zprocess actually has exactly the same issue with its Process class, so it will not be of any help! Being on Linux doesn't help either since Qt GUIs and zmq sockets do not seem to survive forking (or, something doesn't it just seems to hang indefinitely). You need to tell the multiprocessing module to 'spawn' processes the Windows way instead of forking.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


So this is working for me on Linux:

import multiprocessing
import lyse

from farm_module import farm

if __name__ == '__main__':
    if multiprocessing.get_start_method(True) != 'spawn':
        multiprocessing.set_start_method('spawn')

    sml= multiprocessing.Process(target=farm)
    sml.start()

It should work on Windows as well, though the checking/setting the start method should be unnecessary on Windows since 'spawn' is the default there, it is just to tell multiprocessing not to fork() on Linux.

philipstarkey commented 5 years ago

Original comment by Lincoln Turner (Bitbucket: lincolnturner_monash, GitHub: lincolnturner).


Super! Many thanks both of you for the quick response. This is a student project and needs to get unstuck, so thanks for responding to the ‘urgent’ issue. We’ll try putting the code in a separate file and importing it… this is a harmless solution.

I’ve got no problem with code in multiple files, but want to avoid multiple actors that need to be separately started or maybe go off an become zombies. Our previous solution was a three-actor nightmare.

philipstarkey commented 5 years ago

Original comment by Lincoln Turner (Bitbucket: lincolnturner_monash, GitHub: lincolnturner).


So it runs the first time, the subprocess runs, it even displays “Print something” (I wasn’t expecting stdout to find its way home, but good stuff). And lyse shows the process as completed. (no eggtimer).

However running it a second time produces the below traceback. It should be idempotent at this stage, not least because the subprocess prints something and then exits, but even if the subprocess was long-lived, I should be able to kick off a few of them at least! Of course this is not the ultimate aim, but something else is broken here (and Steven Li has got this KeyError a few times now in several different way (multithread and multiprocess) making me thing it is a lyse bug…

Traceback (most recent call last):
File "C:\Program Files\Anaconda3\envs\python37\lib\threading.py", line 865, in run
self.target(*self.args, **self.kwargs)
File "C:\labscript_suite\lyse_main.py", line 721, in analysis_loop
self.do_analysis(filepath)
File "C:\labscript_suite\lyse_main.py", line 745, in do_analysis
success, updated_data = routine.do_analysis(filepath)
File "C:\labscript_suite\lyse_main.py", line 272, in do_analysis
signal, data = self.from_worker.get()
File "C:\Program Files\Anaconda3\envs\python37\lib\site-packages\zprocess\process_tree.py", line 313, in get
obj = self.sock.recv_pyobj()
File "C:\Program Files\Anaconda3\envs\python37\lib\site-packages\zmq\sugar\socket.py", line 624, in recv_pyobj
return self._deserialize(msg, pickle.loads)
File "C:\Program Files\Anaconda3\envs\python37\lib\site-packages\zmq\sugar\socket.py", line 491, in _deserialize
return load(recvd)
KeyError: 101

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Ah, my mistake - I did not test my code in an isolated way, I had made changes to analysis_subprocess.py that it turns out are necessary too.

It looks like the subprocess executes the __main__ module unconditionally, so the fact that analysis_subprocess.py has import side-effects (i.e. making zmq connections to the parent process) is messing with things. I have no idea why KeyError: 101 within unpickling is the result specifically, but probably zeromq ends up with multiple connections to lyse and is round-robining the messages, which is absolutely not as designed!

I've made a patch to move all the side-effects of analysis_subprocess.py into its __main__ block, such that they will not be executed in the subprocess.

It seems to fix the issue, at least, so long as the objects needed by the child process are defined in a separate module.

You (or Steven) can test by:

hg pull --branch bugfix #!/cbillington/lyse
hg update bugfix

in the lyse repo.

But I'll also look into a more complete fix where Python treats your analysis routine as the __main__ module for the purposes of multiprocessing.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Further to my last comment, I've gone ahead and done the 'full blown' fix that should make your original code work without the need for moving code into an external module. The only caveats that apply are the usual ones for multiprocessing (guard the code you don't want to run in the subprocess behind a if __name__ == '__main__': check).

This wasn't hard to do, and is cleaner than what we were doing before, but nonethless is a change in the semantics of how lyse code runs and so has the potential to mess with things. It should be tested on real setups before merging.

philipstarkey commented 5 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


The patch fixes the issue. Some observations:

File "C:\ProgramData\Anaconda3\lib\site-packages\win32\lib\pywintypes.py", line 129, in import_pywin32_system_module
  assert sys.modules[modname] is not old_mod
AssertionError

I’ll try to produce a minimum example of this.

philipstarkey commented 5 years ago

Original comment by Lincoln Turner (Bitbucket: lincolnturner_monash, GitHub: lincolnturner).


Great, thanks for testing Russ. I’ll try in spinor lab shortly.

I’m not surprised that stdout isn’t returned… although I was surprised when it was, so it is odd that it now isn’t.

i’m not too concerned about modulewatcher. With something like this, having to reload the long-lived worker process manually, if the code is changed, doesn’t seem too much hassle.

philipstarkey commented 5 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


i’m not too concerned about modulewatcher. With something like this, having to reload the long-lived worker process manually, if the code is changed, doesn’t seem too much hassle.

Sure, but presently this pathology (resulting from any modification to the script or its dependencies) brings down lyse entirely.

philipstarkey commented 5 years ago

Original comment by Lincoln Turner (Bitbucket: lincolnturner_monash, GitHub: lincolnturner).


Oh! I see. That’s not cool.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


I suspect pywin32 modules aren't being whitelisted by ModuleWatcher, since they are not loaded until needed and so aren't present when ModuleWatcher is first instantiated (which is what determined the whitelist). The reason they aren't loaded until used is that they don't exist on linux, so their import is conditional on the functions actually getting called. But I can modify things so that they make it into the whitelist on Windows.

Having said that, I don't really expect long-lived threads to play well with reloading modules generally, since the thread can hold references to 'old' modules and then you can have two copies of the same module in the interpreter, which can lead to unpredictable results.

I can make lyse not crash unrecoverably though, I'll look into it.

stdout not making it surprises me. Output redirection in zprocess is pretty sophisticated! It does low-level hooking in to file descriptors to catch anything that would be written to stdout or stderr whether it's a subprocess or a C extension or whatever. I see the output on Linux Python 3.7, and Windows Python 3.7, but not Windows Python 2.7. Zprocess has a test suite that checks this, I'll run it in Python 2.7 and see what's up.

philipstarkey commented 5 years ago

Original comment by Lincoln Turner (Bitbucket: lincolnturner_monash, GitHub: lincolnturner).


Don’t follow the pywin32 stuff but making lyse not crash sounds like a plan :slight_smile:

We are not using zprocess to spawn the subprocess at present, at least the MnWE above doesn’t…. so I wouldn’t actually expect the subprocess’s stdout to go anywhere. Of course maybe plain old multiprocessing also redirects spawn’d process’s stdout…

I wouldn’t bother testing on py2.7 apart from for completeness because we are certainly not using it for this stuff - tensorflow needs py3.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


The lyse analysis subprocess has output redirection turned on, and this redirection should apply to everything in the current process and all subprocesses connected to the same file descriptors, regardless of whether they are started using zprocess or not. I'm curious why it isn't working on Python 2, but if it's not trivial to fix I won't bother since Python 2 is close to end of life and I don't care enough.

philipstarkey commented 5 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


The print statement output does not appear in the OutputBox when run as a single script or when 'farm' is imported to your example __main__ method above.

This was observed using Python 3 for me.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Ah, so actually for me it seems to not depend on the Python version, but on whether Python was run from the command line.

Damn, I don't think I ever tested that. It's probably that pythonw.exe doesn't even have a stdout filehandle to hook into.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


It looks like I specifically catered for this situation of output redirection of a non-zprocess subprocess from pythonw.exe, so it's a shame it's not working now. I don't think I would have put that in there if it was untested.

philipstarkey commented 5 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


I can confirm that I see the output when starting lyse via python but not when starting lyse via pythonw.

philipstarkey commented 5 years ago

Original comment by Lincoln Turner (Bitbucket: lincolnturner_monash, GitHub: lincolnturner).


Ah! Yes, likewise. I have been using python -m lyse from py37 env to start lyse under python 3, so that’s why my python3-lyse has been showing the subprocess stdout.

It may be moot as Russ’s skeleton code just committed to analysislib uses, I think, threads not subprocesses. We were only driven to subprocessing because we thought that multithreading in m-loop was interfering with lyse worker threads, but this was likely an incorrect debugging assumption.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


For what it's worth, the zprocess way of running a subprocess to do something similar is like the following:

from zprocess import Process, ProcessTree

class FarmProcess(Process):
    def run(self, x, y):
        print('Print something', x, y)
        self.to_parent.put(x + y)

if __name__ == '__main__':
    farm_process = FarmProcess(ProcessTree.instance())
    to_farm, from_farm = farm_process.start(1, 2)
    result = from_farm.get()
    print('result was:', result)

A little more complicated, but the redirection works in pythonw.exe! There are more arguments to Process and ProcessTree for configuring security and output redirection and such, but since lyse analysis routines are already zprocess subprocesses, ProcessTree.instance() gives one that has already been configured to redirect to the lyse output box. But the default ProcessTree.instance() means this will work fine in a standalone script too.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Fixed in PR #61

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


I have a fix for the output redirection issue, will be forthcoming. Closing this as the reported issue is fixed.