kquick / Thespian

Python Actor concurrency library
MIT License
189 stars 24 forks source link

multiprocTCPBase/multiprocUDPBase not working on a Mac #92

Open tomerd opened 4 months ago

tomerd commented 4 months ago

multiprocTCPBase and multiprocUDPBase fail in the same way on a Mac

tracked it down to multiprocessCommon::_startAdmin where the response after starting the admin comes back null

...
self.transport.connectEndpoint(endpointPrep)

response = self.transport.run(None, MAX_ADMIN_STARTUP_DELAY)
^^

exact same setup works fine on linux, but this makes developing on a Mac difficult. triple checked nothing else is listening on the port, so not sure what is going on

startup log:

2024-04-21 22:32:58,518 ++++ Actor System gen (3, 10) started, admin @ ActorAddr-(UDP|:7070)
2024-04-21 22:32:58,518 Thespian source: <redacted>/lib/python3.12/site-packages/thespian/__init__.py
None
2024-04-21 22:33:03,520 startup failed: ActorAddr-(UDP|:7070) is not a valid ActorSystem admin
2024-04-21 22:41:23,604 ++++ Actor System gen (3, 10) started, admin @ ActorAddr-(T|:7070)
2024-04-21 22:41:23,605 Thespian source: <redacted>/lib/python3.12/site-packages/thespian/__init__.py
None
2024-04-21 22:41:28,606 startup failed: ActorAddr-(T|:7070) is not a valid ActorSystem admin
kquick commented 4 months ago

Thespian has worked previously on Mac systems, but I haven't had a Mac for several years, so it's possible there have been changes in the OS releases in the interim.

Can you provide the Mac OS release info, and also run $ python3 thespian/diagnose.py and provide the results?

tomerd commented 4 months ago
❯ python ./thespian/diagnose.py
Initiating diagnostics
   [......Python] : namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0), hexversion=51119088, _multiarch='darwin')
   [.........(t)] : sys.thread_info(name='pthread', lock='mutex+cond', version=None)
   [.........(p)] : darwin
   [........(mp)] : [<apple_certifi._vendor.wrapt.importer.ImportHookFinder object at 0x102cb8e90>, <_distutils_hack.DistutilsMetaFinder object at 0x102cb97c0>, <class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
# checking imports -> verified ok
# checking thespian internal system imports -> verified ok
# checking existing running actors was skipped - please install psutils python package to support this
# checking hostname -> verified ok
# checking fqdn -> verified ok
# checking addr info proto=UDP desc=default usage=0 -> verified ok
# checking addr info proto=UDP desc=default usage=passive -> verified ok
# checking addr info addr=tombp-5.local proto=UDP desc=hostname usage=0 -> verified ok
# checking addr info addr=tombp-5.local proto=UDP desc=hostname usage=passive -> verified ok
# checking addr info addr=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa proto=UDP desc=fqdn us# checking addr info addr=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa proto=UDP desc=fqdn usage=0 -> verified ok
# checking addr info addr=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa proto=UDP desc=fqdn us# checking addr info addr=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa proto=UDP desc=fqdn usage=passive -> verified ok
# checking addr info proto=TCP desc=default usage=0 -> verified ok
# checking addr info proto=TCP desc=default usage=passive -> verified ok
# checking addr info addr=tombp-5.local proto=TCP desc=hostname usage=0 -> verified ok
# checking addr info addr=tombp-5.local proto=TCP desc=hostname usage=passive -> verified ok
# checking addr info addr=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa proto=TCP desc=fqdn us# checking addr info addr=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa proto=TCP desc=fqdn usage=0 -> verified ok
# checking addr info addr=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa proto=TCP desc=fqdn us# checking addr info addr=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa proto=TCP desc=fqdn usage=passive -> verified ok
# checking IP addresses ... Got 6 IP addresses

    None
    127.0.0.1
    10.0.0.120
    localhost
    0.0.0.0
# checking IP addresses -> verified ok
kquick commented 4 months ago

That indicates no issues were found that the diagnostics.py were able to detect. Do any of the following work?

$ python3 examples/hellogoodbye.py
$ python3 examples/hellogoodbye.py multiprocQueueBase
$ python3 examples/hellogoodbye.py multiprocUDPBase
$ python3 examples/hellogoodbye.py multiprocTCPBase

These should be run from the top-level checkout of the thespian repository, where the thespian and examples directory exist, and either this directory must be in PYTHONPATH or thespian should be installed where a python3 import thespian will work.

tomerd commented 4 months ago
❯ python3 examples/hellogoodbye.py multiprocTCPBase

None
Traceback (most recent call last):
  File "/Users/tomerd/code/other/Thespian/examples/hellogoodbye.py", line 45, in <module>
    run_example(sys.argv[1] if len(sys.argv) > 1 else None)
  File "/Users/tomerd/code/other/Thespian/examples/hellogoodbye.py", line 33, in run_example
    asys = ActorSystem(systembase, logDefs=logcfg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tomerd/miniconda/envs/portola/lib/python3.12/site-packages/thespian/actors.py", line 637, in __init__
    systemBase = self._startupActorSys(
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tomerd/miniconda/envs/portola/lib/python3.12/site-packages/thespian/actors.py", line 678, in _startupActorSys
    systemBase = sbc(self, logDefs=logDefs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tomerd/miniconda/envs/portola/lib/python3.12/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
    super(ActorSystemBase, self).__init__(system, logDefs)
  File "/Users/tomerd/miniconda/envs/portola/lib/python3.12/site-packages/thespian/system/multiprocCommon.py", line 83, in __init__
    super(multiprocessCommon, self).__init__(system, logDefs)
  File "/Users/tomerd/miniconda/envs/portola/lib/python3.12/site-packages/thespian/system/systemBase.py", line 326, in __init__
    self._startAdmin(self.adminAddr,
  File "/Users/tomerd/miniconda/envs/portola/lib/python3.12/site-packages/thespian/system/multiprocCommon.py", line 116, in _startAdmin
    raise InvalidActorAddress(adminAddr,
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid ActorSystem admin
tomerd commented 4 months ago

the rest work okay

kquick commented 4 months ago

This looks like everything is in reasonable shape for the Mac and that it's primarily just a port issue for the multiprocTCPBase. There are three things I can think of that might be occurring here:

  1. There is already something else running on port 1900. You can select a different port by modifying the "Admin Port" when starting the actor system (see https://thespianpy.com/doc/using#hH-a17e6c70-5592-4d06-b818-bd25350c4c53 and https://thespianpy.com/doc/using.html#hH-9d33a877-b4f0-4012-9510-442d81b0837c for more information).
  2. The multiprocTCPBase itself is persistent: it will normally stay running (i.e. a "daemon" or "system service") after the application that started it exits (see Base Persistence at https://thespianpy.com/doc/in_depth.html#hH-b2414e9c-4cec-46e7-8d53-80008f2c9498). A subsequent application issuing a startup can simply connect to that long-running multiprocTCPBase. However, any code loaded in the original base is still running in that service, including any errors, so if there was a failure, the process might still be running but be unable to support additional connections; you would use ps and kill or similar techniques to find and stop this process.
  3. Port 1900 is a relatively low-numbered port. Your system might disallow binding to a low-numbered port, or you may have some sort of firewall or virus protection (perhaps builtin to newer Mac OS versions) that is preventing this. There may be additional information in the thespian_diagnostics.log that was generated when you ran the diagnose.py above. Try using a higher numbered port using the "Admin Port" discussed in the first item above.

Let me know if one of these seems to fix the issue or if you are still having problems.

tomerd commented 4 months ago

thanks @kquick I actually checked all three several times before reporting the issue. it consistently does not work on macOS without anything else on the port. I suspect it may be one other macOS behavior that is getting in the way. I am working with docker rn to workaround this issue

skunath commented 4 weeks ago

I can confirm that I have seen this as well at least for python 3.11 on MacOS Sonoma. I ended up playing around with the gatekeeper settings as I assumed there might be some odd security issue related to it ( this seemed interesting https://medium.com/@ansonliao.xiao/how-to-enable-openanywhere-security-option-in-mac-09e1570aa9ac ). I also ended up trying with python 3.9 and then installing python 3.12. In both 3.9 and 3.12 it seems like I can get thespian to launch correctly with multiprocTCPBase.

Not sure if adjusting gatekeeper was the solution, but it seems to be working.

kquick commented 4 weeks ago

Thank you for the update! Hopefully other Mac users can corroborate this and we can identify the exact solution going forward.

On Thu, Aug 8, 2024 at 10:57 PM S Kunath @.***> wrote:

I can confirm that I have seen this as well at least for python 3.11 on MacOS Sonoma. I ended up playing around with the gatekeeper settings as I assumed there might be some odd security issue related to it ( this seemed interesting @.***/how-to-enable-openanywhere-security-option-in-mac-09e1570aa9ac ). I also ended up trying with python 3.9 and then installing python 3.12. In both 3.9 and 3.12 it seems like I can get thespian to launch correctly with multiprocTCPBase.

Not sure if adjusting gatekeeper was the solution, but it seems to be working.

— Reply to this email directly, view it on GitHub https://github.com/kquick/Thespian/issues/92#issuecomment-2277204514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGAHXI6L6UNTBJADLMBFNTZQRK6PAVCNFSM6AAAAABGSCSCMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZXGIYDINJRGQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Kevin Quick Galois, Inc.

tomerd commented 3 weeks ago

interesting. tried python 3.12 and disabling gateway without success. @skunath do you recall the exact steps you took?