uqfoundation / dill

serialize all of Python
http://dill.rtfd.io
Other
2.27k stars 182 forks source link

regression from 0.3.4 to 0.3.5.x - class decorators #483

Closed bitranox closed 2 years ago

bitranox commented 2 years ago

dear @anivegesana ,

I am the author of the wrapt_timeout_decorator

since version 0.3.5 of dill, two of our tests are failing under windows.

In those tests we use our decorator to decorate some Class Methods, which are then called in a process via multiprocess (which is using implicitely dill). (see this test )

Those tests are just fine with version 0.3.4 on windows/linux/macos, or 0.3.5.x on linux/macos, but the fail with dill > 0.3.4 on windows.

What seems to happen is, that the decorator recursively seem to call itself, creating a chain of subprocesses. In order to debug that behaviour You might set the timeout to 3600 seconds in that tests and print some debug information to stderr.

I really struggle to make a minimal version to show that regression, therefore I would please ask You for help, in order to be able to use dill 0.3.5 upwards for my community.

you can find the failed jobs with dill 0.3.5.1 here and the same tests passing with dill 0.3.4.0 here

yours sincerely

Robert, Vienna

anivegesana commented 2 years ago

I don't have a Windows computer and I am a little bit hesitant about running a potential fork bomb on a computer that isn't mine. Looking into VMs.

In the meantime, can you send the output of the dumps line in

dill.detect.trace(True)
c = dill.dumps(ClassTest1.f)
dill.loads(c)

with both dill 0.3.4 and 0.3.5.1 and whether dumps or loads is causing the infinite loop.

If dumps succeeds, can you share c as well?

bitranox commented 2 years ago

Dear anivegesana Oh, its not a fork bomb - the timeout is like 3 seconds on the original test, so after 4-5 forks (well on windows there is no real fork, therefore its very slow ...) the timeout kicks in. I guess the dill dump will be difficult, because it happens in a subprocess - kind of difficult to get it out there ? but I will try to share the requested data for windows/linux/0.3.4/0.3.5.x .... thanks for Your effort greetings from Vienna, Austria Robert

bitranox commented 2 years ago

Dear anivegesana, it is as I thought - I can not even pickle the decorated class method. And I usually dont do that - I just use multiprocess which is using dill.

with dill 0.3.4 on windows - I cant pickle the function, but multiprocess is working like intended
def test_dill():
    """
    >>> test_dill()
    """
    import dill
    dill.detect.trace(True)
    c = dill.dumps(ClassTest1.f)
    dill.loads(c)
    test_dill()
Exception raised:
    Traceback (most recent call last):
      File "C:\Program Files\JetBrains\PyCharm 2022.1.1\plugins\python\helpers\pycharm\docrunner.py", line 138, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest test_dill[0]>", line 1, in <module>
        test_dill()
      File "C:\opt/rotek-apps/lib/bitranox/wrapt_timeout_decorator/tests\wrapt_timeout_test_helper.py", line 35, in test_dill
        c = dill.dumps(ClassTest1.f)
      File "C:\opt\rotek-apps\lib\bitranox\wrapt_timeout_decorator\venv\lib\site-packages\dill\_dill.py", line 304, in dumps
        dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
      File "C:\opt\rotek-apps\lib\bitranox\wrapt_timeout_decorator\venv\lib\site-packages\dill\_dill.py", line 276, in dump
        Pickler(file, protocol, **_kwds).dump(obj)
      File "C:\opt\rotek-apps\lib\bitranox\wrapt_timeout_decorator\venv\lib\site-packages\dill\_dill.py", line 498, in dump
        StockPickler.dump(self, obj)
      File "C:\Program Files\Python310\lib\pickle.py", line 487, in dump
        self.save(obj)
      File "C:\Program Files\Python310\lib\pickle.py", line 578, in save
        rv = reduce(self.proto)
    NotImplementedError: object proxy must define __reduce_ex__()

thanks for Your effort greetings from Vienna, Austria Robert

mmckerns commented 2 years ago

@anivegesana: my build and test boxes are (docker instances inside) VMs. FYI.

bitranox commented 2 years ago

Dear anivegesana, here the next attempt to shed some light on the issue:

def test_timeout_class_method(use_signals: bool) -> None:
    """
    >>> test_timeout_class_method(False)
    """
    import dill.detect
    dill.detect.trace(True)
    assert ClassTest1().f(dec_timeout="instance.x", dec_allow_eval=True, use_signals=use_signals) is None

under dill 0.3.4 - it works. under dill 0.3.5.1, the same output, but looping until timeout (of course with different object adresses in each loop)

D2: <dict object at 0x000002BE74FBF680>
T4: <class 'multiprocess.process.AuthenticationString'>
# T4
# D2
T4: <class 'multiprocess.context.Process'>
# T4
D2: <dict object at 0x000002BE74FD7480>
D2: <dict object at 0x000002BE72CE6140>
T4: <class 'multiprocess.process.AuthenticationString'>
# T4
# D2
F2: <function _target at 0x000002BE74FB84C0>
# F2
T4: <class 'wrapt_timeout_decorator.wrap_helper.WrapHelper'>
# T4
D2: <dict object at 0x000002BE74FBD600>
T1: <class 'TimeoutError'>
F2: <function _load_type at 0x000002BE74A243A0>
# F2
# T1
Me: <bound method ClassTest1.f of <wrapt_timeout_test_helper.ClassTest1 object at 0x000002BE74FC1F00>>
T1: <class 'method'>
# T1
F1: <function ClassTest1.f at 0x000002BE74FB8D30>
F2: <function _create_function at 0x000002BE74A244C0>
# F2
Co: <code object f at 0x000002BE74FAF050, file "C:\opt/rotek-apps/lib/bitranox/wrapt_timeout_decorator/tests\wrapt_timeout_test_helper.py", line 13>
F2: <function _create_code at 0x000002BE74A24550>
# F2
# Co
D4: <dict object at 0x000002BE74A2AC40>
# D4
D2: <dict object at 0x000002BE74FF0C40>
# D2
# F1
T4: <class 'wrapt_timeout_test_helper.ClassTest1'>
# T4
D2: <dict object at 0x000002BE72C9AC40>
# D2
# Me
D2: <dict object at 0x000002BE74FBE580>
# D2
F2: <function rebuild_pipe_connection at 0x000002BE74FEAA70>
# F2
T4: <class 'multiprocess.reduction.DupHandle'>
# T4
D2: <dict object at 0x000002BE74FF1E80>
# D2
# D2
D2: <dict object at 0x000002BE72CA00C0>
# D2
# D2

thanks for Your effort greetings from Vienna, Austria Robert

anivegesana commented 2 years ago

That output is very strange. It is matching, which means that dill is not going through infinite regress, multiprocess is just executing the same code over and over. It is even more bizarre that it only effects dill.

bitranox commented 2 years ago

Dear anivegesana, I guess it is the combination of some black magic from wrapt decorator, together with multiprocess and dill. Just guessing. The thing is, it worked fine until (including) Dill Version 0.3.4. Its a pitty, because my users are using that timeout decorator exactly because it worked well with windows.

I pinned down the Dill Version to 0.3.4 in the requirements.txt on Windows for the time beeing, but I hope You can find the bug ...

Yours sincerely Robert

anivegesana commented 2 years ago

I am still having trouble reproducing this bug. I am getting the error you described when you pickled the function. I'll try other Windows computers, but it might be dependent on the environment and I might need to try it on a CI which will be incredibly painful to debug. ~First, I think I should make a small patch to the issue with pickling wrapt classes which hopefully will resolve the issue without resorting to that.~

```pycon Python 3.7.11 [MSC v.1927 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license()" for more information. >>> from wrapt_timeout_decorator import timeout >>> class ClassTest1(object): ... def __init__(self) -> None: ... self.x = 3.0 ... @timeout("instance.x/10", dec_allow_eval=True) # type: ignore ... def f(self) -> None: ... time.sleep(0.5) >>> ClassTest1().f() Traceback (most recent call last): File "", line 1, in ClassTest1().f() File "wrapt_timeout_decorator-1.3.12-py3-none-any.whl\wrapt_timeout_decorator\wrapt_timeout_decorator.py", line 139, in wrapper return wrapped_with_timeout(wrap_helper) File "wrapt_timeout_decorator-1.3.12-py3-none-any.whl\wrapt_timeout_decorator\wrapt_timeout_decorator.py", line 148, in wrapped_with_timeout return wrapped_with_timeout_process(wrap_helper) File "wrapt_timeout_decorator-1.3.12-py3-none-any.whl\wrapt_timeout_decorator\wrapt_timeout_decorator.py", line 162, in wrapped_with_timeout_process return timeout_wrapper() File "wrapt_timeout_decorator-1.3.12-py3-none-any.whl\wrapt_timeout_decorator\wrap_function_multiprocess.py", line 41, in __call__ self.__process.start() File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\process.py", line 112, in start self._popen = self._Popen(self) File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\context.py", line 322, in _Popen return Popen(process_obj) File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\popen_spawn_win32.py", line 89, in __init__ reduction.dump(process_obj, to_child) File "multiprocess-0.70.13-py37-none-any.whl\multiprocess\reduction.py", line 63, in dump ForkingPickler(file, protocol, *args, **kwds).dump(obj) File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 620, in dump StockPickler.dump(self, obj) File "C:\Program Files\Python\lib\pickle.py", line 437, in dump self.save(obj) File "C:\Program Files\Python\lib\pickle.py", line 549, in save self.save_reduce(obj=obj, *rv) File "C:\Program Files\Python\lib\pickle.py", line 662, in save_reduce save(state) File "C:\Program Files\Python\lib\pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1251, in save_module_dict StockPickler.save_dict(pickler, obj) File "C:\Program Files\Python\lib\pickle.py", line 859, in save_dict self._batch_setitems(obj.items()) File "C:\Program Files\Python\lib\pickle.py", line 885, in _batch_setitems save(v) File "C:\Program Files\Python\lib\pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "C:\Program Files\Python\lib\pickle.py", line 774, in save_tuple save(element) File "C:\Program Files\Python\lib\pickle.py", line 549, in save self.save_reduce(obj=obj, *rv) File "C:\Program Files\Python\lib\pickle.py", line 662, in save_reduce save(state) File "C:\Program Files\Python\lib\pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1251, in save_module_dict StockPickler.save_dict(pickler, obj) File "C:\Program Files\Python\lib\pickle.py", line 859, in save_dict self._batch_setitems(obj.items()) File "C:\Program Files\Python\lib\pickle.py", line 885, in _batch_setitems save(v) File "C:\Program Files\Python\lib\pickle.py", line 549, in save self.save_reduce(obj=obj, *rv) File "C:\Program Files\Python\lib\pickle.py", line 638, in save_reduce save(args) File "C:\Program Files\Python\lib\pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "C:\Program Files\Python\lib\pickle.py", line 774, in save_tuple save(element) File "C:\Program Files\Python\lib\pickle.py", line 549, in save self.save_reduce(obj=obj, *rv) File "C:\Program Files\Python\lib\pickle.py", line 633, in save_reduce save(cls) File "C:\Program Files\Python\lib\pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1840, in save_type )), obj=obj, postproc_list=postproc_list) File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1140, in _save_with_postproc pickler.save_reduce(*reduction, obj=obj) File "C:\Program Files\Python\lib\pickle.py", line 638, in save_reduce save(args) File "C:\Program Files\Python\lib\pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "C:\Program Files\Python\lib\pickle.py", line 789, in save_tuple save(element) File "C:\Program Files\Python\lib\pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "dill-0.3.5.1-py2.py3-none-any.whl\dill\_dill.py", line 1251, in save_module_dict StockPickler.save_dict(pickler, obj) File "C:\Program Files\Python\lib\pickle.py", line 859, in save_dict self._batch_setitems(obj.items()) File "C:\Program Files\Python\lib\pickle.py", line 885, in _batch_setitems save(v) File "C:\Program Files\Python\lib\pickle.py", line 524, in save rv = reduce(self.proto) NotImplementedError: object proxy must define __reduce_ex__() ```
anivegesana commented 2 years ago

Based on GrahamDumpleton/wrapt#158, I believe this is likely mostly a conscious choice by the wrapt author as opposed to a problem in dill or multiprocess. multiprocess should not spin into an infinite loop and silently fail in very strange corner cases, so I believe that this issue should still be left up or moved over to multiprocess in case we can find out what is causing that strange behavior.

They created the change to wrapt that disabled pickling of ObjectProxy objects in this commit: GrahamDumpleton/wrapt/13c55faf7d3e83e184b8a150f0df3de294698b25

So it is still a mystery how the new version of dill changed behavior for this error to bubble up now and why it was ignored earlier. Also, what was the monkey patch you referenced in https://github.com/bitranox/wrapt_timeout_decorator/issues/13#issuecomment-483225115?

bitranox commented 2 years ago

Dear anivegesana,

I am still having trouble reproducing this bug

which trouble ? Its kind of hard to debug in a subprocesses, but I think You should concentrate to debug multiprocess & pickle together, instead of only pickle the function itself. This also did not work in the past.

I am getting the error you described when you pickled the function

1 Yes, this pickle error happens also with dill 0.3.4 - but multiprocess is working with my wrapt_timeout_decorator and dill 0.3.4. So it seems it does NOT have anything to do with that NotImplementedError: object proxy must define __reduce_ex__()

2 it works under linux, also with dill '0.3.5' - so You might concentrate on the differences between linux and win32 inplementation ? There should be some if sys.platform == 'win32' or similar ? ...

3 You migh also check which change between 0.3.4 and 0.3.5 is causing the trouble - maybe You can git bisect those changes

4 the error NotImplementedError: object proxy must define __reduce_ex__() does NOT pop up in the trace when using multiprocess

5

Also, what was the monkey patch you referenced in https://github.com/bitranox/wrapt_timeout_decorator/issues/13#issuecomment-483225115?

I dont remember - I did not tinker around with dill or multiprocess

I really dont like mysteries ;-), it means there is a hidden bug in the code .... which should be found, even if it is "hard to debug"

You have all the tools on hand to find that bug - just by replay the changes between 0.3.4 and 0.3.5 - Your argument "I dont have a windows machine" seems somehow a little bit odd in times of virtual machines nowadays ...

please be so kind and consider to help here, because a number of astronomers rely on that package.

yours sincerely

Robert

mmckerns commented 2 years ago

Well, the mystery is indeed what is causing the strange behavior. The root cause is unknown, and it could be a bug in multiprocess, dill, or wrapt. It being a Windows-only bug makes it more difficult, even in the day of VMs, as one needs a copy of Windows and potentially some time investment to set up the VM. Let's keep it civil, and figure it out together.

So step (1) is make sure whatever issue is being reported can be reproduced. Then step (2) is try to use diagnostic tools to find what the root cause is. dill has dll.detect.trace and some other tools in the dill.detect module. You can see if it shows up with serialization variants in dill.settings. You can check if it shows up in multiprocess when the settings are changed as well -- such as trying the spawn multiprocess.context on a mac/linux box, or similarly trying to pickle with ForkingPickler from multiprocess. multiprocess also has a number of loggers, including a debugger and a subprocess debugger. pathos has subprocess profilers in pathos.profille that can help see what functions are called that might not show up in the traceback or on the debugger.

anivegesana commented 2 years ago

I did not intend to make it seem that your problem is not a problem. In fact, I wanted to imply that there are likely multiple problems in multiple libraries that makes it very hard to diagnose and we should try to break it up into smaller pieces.

All I meant by not being able to reproduce it was that on some Windows computers it has one behavior and on other Windows computers it has another, which is very bad and is the first problem that should be tackled. It is impossible to get anywhere else without consistency.

I do have access to VMs and am trying it on different VM platforms, hoping to find a difference that is causing the behavior.

pathos.profiler will probably help out a lot and I'll use it once I find a VM that causes the issue.

The most likely reason was that a change in dill broke a monkey patch that multiprocess made for Windows, but it breaks it differently based on other properties of the environment depending on what that patch does. I think it is in multiprocess because the other libraries are platform agnostic and there shouldn't be a reason that it performs differently on different machines with the same OS.

anivegesana commented 2 years ago

One thing I noticed was that you ran your test that passed on Windows with Python 3.9 and the one that failed on Windows with Python 3.10. I tried it out on Python 3.7. That might be a confounding factor here. When I find time, I'll try it out on different Python versions and see if that is the cause of the varying results.

bitranox commented 2 years ago

Dear,mmckerns, Dear anivegesana,

Well, the mystery is indeed what is causing the strange behavior. The root cause is unknown, and it could be a bug in multiprocess, dill, or wrapt.

I did not intend to make it seem that your problem is not a problem. In fact, I wanted to imply that there are likely multiple problems in multiple libraries that makes it very hard to diagnose and we should try to break it up into smaller pieces.

Everything absolutely True, but dill is the only component that changed - and we know the bug was introduced between 0.3.4 and 0.3.5 - and it only occurs on windows - so this should help to find the issue.

It being a Windows-only bug makes it more difficult, even in the day of VMs, as one needs a copy of Windows and potentially some time investment to set up the VM.

I disagree here - tests on windows are absolutely necessary, because many things work different on windows - especially forking for multiprocess. Therefore a windows test vm sould always be in Your toolbox ...

One thing I noticed was that you ran your test that passed on Windows with Python 3.9 and the one that failed on Windows with Python 3.10. I tried it out on Python 3.7. That might be a confounding factor here.

I just updated the tests from 3.9 to 3.10 AFTER working on that issue. It makes no difference if 3.9 or 3.10

Let's keep it civil, and figure it out together

Sure - if I came across rude, please accept my sincere apologies - please consider the German/Austrian mentality, which is abolutely straight forward. (well Austrians a bit less the Germans ...)

The most likely reason was that a change in dill broke a monkey patch that multiprocess made for Windows, but it breaks it differently based on other properties of the environment depending on what that patch does. I think it is in multiprocess because the other libraries are platform agnostic and there shouldn't be a reason that it performs differently on different machines with the same OS.

Sounds good. You can let all the components be the same, just change dill from 0.3.4 to 0.3.5 I totally agree with Your opinion, in fact I mentioned it before :

https://github.com/uqfoundation/dill/issues/2 it works under linux, also with dill '0.3.5' - so You might concentrate on the differences between linux and win32 inplementation ? There should be some if sys.platform == 'win32' or similar ? ...

greetings from Vienna, Austria, and thanks for Your help !

yours sincerely

Robert

anivegesana commented 2 years ago

If the only difference is the dill version and it works on some Windows computers and not others, I would assume that the wait has something to do with it. I tried it out on a big VM with 10 CPU cores and GitHub actions probably has just 2. I was to throw a wild guess, I would assume that it is related to #482 where a bug fix to dill's handling of recursive functions (#443) caused deeper searches for objects to pickle. Does adding

dill.settings['recurse'] = True

to the beginning of the test help? I am at a loss though, and the only way that I could hope to help is by recreating the entire setup on GitHub Actions (as opposed to Azure/others like I am trying now) and run your tests there.

mmckerns commented 2 years ago

The most likely reason was that a change in dill broke a monkey patch that multiprocess made for Windows, but it breaks it differently based on other properties of the environment depending on what that patch does. I think it is in multiprocess because the other libraries are platform agnostic and there shouldn't be a reason that it performs differently on different machines with the same OS.

What specifically are you referring to? I know the multiprocess code pretty well... and I should be aware of any patches I've made for Windows. One thing to note about multiprocess is that the source code is different for each version of python, as well as there are portions that are different for each platform.

anivegesana commented 2 years ago

I haven't look into the multiprocess source code yet. That was more of a list of places that I would plan to look next rather than a particular fix that I was suggesting.

bitranox commented 2 years ago

Dear anivegesana,

adding dill.settings['recurse'] = True does not make any difference. Windows Version seems not to matter - I have exactly the same behaviour on github actions and on my local machine with 8 cores and windows11 ... yours sincerely Robert

mmckerns commented 2 years ago

The mods to multiprocess from multiprocessing are pretty small, so generally if there's an issue in multiprocess, it's also in the STL multiprocessing as well.

@bitranox: Can you specify a matrix of versions of python, dill, multiprocess and wrapt that you see the issue in, and where you don't?

bitranox commented 2 years ago

Dear,mmckerns, that matrix seems easy - it worked all the way down since 2017, until now, an Windows7 / Windows10 / Windows 11, with Python 3.7/3.8/3.9/3.10 , o the difference is really only the new dill version ...

I guess to roll back the changes gradually from 3.0.5 to 3.0.4 should be the easiest way !

yours sincerely Robert

anivegesana commented 2 years ago

443 was definitely the beefiest change, so testing 04aba03432955f64c05441ffe0553af987c087ff vs 0392e141995684f426fd344724c6f66999fbd916 would give a half-way point between 0.3.4 and 0.3.5.

Thank you for your patience with this bug.

mmckerns commented 2 years ago

@bitranox: can you generate a matrix, even if it's easy. I haven't seen a quote on the versions of multiprocess or wrapt that show the issue. I think we are all assuming that a change to dill is what triggered the issue. However, if you can explore version combinations a bit, it might help diagnose what's going on. @anivegesana: stepping through the commits to dill (starting with the two quoted above) is probably the right thing to do.

bitranox commented 2 years ago

Dear anivegesana, https://github.com/uqfoundation/dill/commit/04aba03432955f64c05441ffe0553af987c087ff is working

Dear,mmckerns, creating the matrix would make a lot of work and no benefit. I found no combination until now, working with 0.3.5, and I havent found any combination NOT working with 0.3.4

I really try to make it easy for You folks, but lets concentrate on something that looks more beneficial:

stepping through the commits to dill (starting with the two quoted above) is probably the right thing to do.

Yours sincerely Robert

bitranox commented 2 years ago

Dear anivegesana, https://github.com/uqfoundation/dill/commit/0392e141995684f426fd344724c6f66999fbd916 is also working

anivegesana commented 2 years ago

Ok. Not expected. Most other issues from this week were concerning that PR. That narrows it down significantly. How about e2831d04ae64d6dc8686a3c5ed1eb703dc748d43?

bitranox commented 2 years ago

Dear anivegesana, https://github.com/uqfoundation/dill/tree/e2831d04ae64d6dc8686a3c5ed1eb703dc748d43 is also working

anivegesana commented 2 years ago

d2f916c539967150ae21240b7c2922023cd215ac?

bitranox commented 2 years ago

https://github.com/uqfoundation/dill/commit/d2f916c539967150ae21240b7c2922023cd215ac NOT working

anivegesana commented 2 years ago

Ok. It is somewhere between. 23c47455da62d4cb8582d8f98f1de9fc6e0971ad?

bitranox commented 2 years ago

https://github.com/uqfoundation/dill/commit/23c47455da62d4cb8582d8f98f1de9fc6e0971ad NOT working

anivegesana commented 2 years ago

5bd56a8e0096c3f45f4c37ff0b67ac6aa341f22b?

bitranox commented 2 years ago

https://github.com/uqfoundation/dill/commit/5bd56a8e0096c3f45f4c37ff0b67ac6aa341f22b NOT working

anivegesana commented 2 years ago

Last one: 914d47f7f2c2d7fb0d91ff7c3e48a9c6a22a2b0a? Then we know for sure.

bitranox commented 2 years ago

https://github.com/uqfoundation/dill/commit/914d47f7f2c2d7fb0d91ff7c3e48a9c6a22a2b0a NOT Working lets double check - which one was the commit before that ?

anivegesana commented 2 years ago

The parent of that is e2831d04ae64d6dc8686a3c5ed1eb703dc748d43. So I know what to look for now. Will get back to you with a fix. Thank you for your patience and help with debugging this!

bitranox commented 2 years ago

Yes, double checked, https://github.com/uqfoundation/dill/commit/e2831d04ae64d6dc8686a3c5ed1eb703dc748d43 IS working.

anivegesana commented 2 years ago

Ok. I found out that running the test suite with PyTest instead of just running the test manually lets me reproduce your error. This is not an issue with your library or multiprocess, it is an issue between dill and PyTest like #482. The error message I was getting back up here was because the console was implicitly __main__, which would break multiprocess anyway.

Now that I understand multiprocess and this issue, I can be more useful. ~It is probably not exactly the same as #482, so I'll do some more poking to see if it is solved by #486 or not.~ As far as I can tell, it is exactly the same as #482 and the tests pass. Rest assured, a fix will be available in dill 0.3.5.2 (if that is still on @mmckerns' radar in June, else dill 0.3.6).

bitranox commented 2 years ago

Dear anivegesana, that sounds great, thank You for Your hard work. Yours sincerely Robert

mmckerns commented 2 years ago

Yes, we should expect dill to emit a new release in June or soon thereafter. The milestone is tagged as dill-0.3.6. If you want to apply it to this and other PRs, go ahead. @anivegesana if you are sure it's the case, then we should tag and close this issue as a duplicate of #482.

anivegesana commented 2 years ago

And thank you both for being patient. This was a very frustrating bug that kept slipping out from underneath my nose due to all of the moving parts and because I kept searching in the wrong places.

anivegesana commented 2 years ago

I don't think I have permissions to do that. If it would be more convenient for me to do that because of regular contributions, I believe that the role is triage:

https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role

It is hopefully broad enough to allow contributors to help with managing the repo while not being expressive enough to cause a dominictarr/event-stream situation because there is no way to change the code on master without going through you.