kquick / Thespian

Python Actor concurrency library
MIT License
189 stars 24 forks source link

Issues on OS X - thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid or useable ActorSystem Admin #54

Closed jchrisweaver closed 4 years ago

jchrisweaver commented 4 years ago

I come by way of trying to get ESRally to run to test various Elasticsearch configs.

I'm unable to run the simple example in the docs. Any suggestions on how to debug would be greatly appreciated.

Full trace

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/actors.py", line 638, in __init__
    systemBase, capabilities, logDefs)
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/actors.py", line 676, in _startupActorSys
    systemBase = sbc(self, logDefs=logDefs)
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
    super(ActorSystemBase, self).__init__(system, logDefs)
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/system/multiprocCommon.py", line 86, in __init__
    super(multiprocessCommon, self).__init__(system, logDefs)
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/system/systemBase.py", line 336, in __init__
    'not a valid or useable ActorSystem Admin')
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid or useable ActorSystem Admin

Steps to reproduce

  1. pip install thespian
  2. Run the following code (found in the QuickStart
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from thespian.actors import *
>>> ActorSystem("multiprocTCPBase")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/actors.py", line 638, in __init__
    systemBase, capabilities, logDefs)
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/actors.py", line 676, in _startupActorSys
    systemBase = sbc(self, logDefs=logDefs)
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
    super(ActorSystemBase, self).__init__(system, logDefs)
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/system/multiprocCommon.py", line 86, in __init__
    super(multiprocessCommon, self).__init__(system, logDefs)
  File "/Users/chris/code/esrally/thesbian/venv/lib/python3.7/site-packages/thespian/system/systemBase.py", line 336, in __init__
    'not a valid or useable ActorSystem Admin')
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid or useable ActorSystem Admin

Here's the additional network-related info you've asked for in other dicussions:

>>> import socket
>>> hb = socket.gethostname()
>>> print( str(hb) )
Murdock.local
>>> fqdn = socket.getfqdn()
>>> print( str( fqdn ) )
Murdock.local
>>> socket.getaddrinfo(None, 0, socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 0))]
>>> socket.getaddrinfo(hb, 0, socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('10.1.2.125', 0))]
>>> socket.getaddrinfo(fqdn, 0, socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('10.1.2.125', 0))]

Thanks for any assistance.

kquick commented 4 years ago

Hi @jchrisweaver ,

The most likely scenario is that port 1900 is in use, either by a different service or by a previous instance of Thespian that didn't fully shutdown.

On the mac, try:

$ netstat -vatn | grep 1900
tcp4      0     0   *.1900      *.*     LISTEN  131072 131072   404    0
$ ps -awfx | grep 404
   0    404    1   0   5:38PM  ?? 0:00:03 python
$

The LISTEN on port 1900 shows something is active, and the 2nd to last column is the PID, which I then display to see what it is. In this case, it corresponds to the Thespian instance I currently have running.

If you do have something on port 1900 and it's Thespian, you can try $ python -m thespian.director stop or alternatively python -c 'from thespian.actors import *; ActorSystem("multiprocTCPBase").shutdown()' to stop it, but if the previous version is hung for some reason, you may need to kill the process (possibly kill -9 PID).

Let me know if this doesn't help and we can dig deeper.

jchrisweaver commented 4 years ago

Thanks for the quick response!

I rebooted just for good measure. ;) After that, I verified the port wasn't in use and fired up lsof -P -n -r 1 -i :1900 Following the same steps again, I got the following:

=======
COMMAND  PID  USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
python3 2030 chris    9u  IPv4 0x8e543c41ae576a07      0t0  TCP *:1900 (LISTEN)
python3 2031 chris    9u  IPv4 0x8e543c41ae576a07      0t0  TCP *:1900 (LISTEN)
python3 2031 chris   13u  IPv4 0x8e543c41a7d6cf87      0t0  TCP 10.34.10.6:49929->10.34.10.6:1900 (SYN_SENT)
=======

The 10.34.10.6 is notable since my local network is 10.1.2.* After some poking around, I turned off my software VPN software, tried the test again, and the previously failing test succeeded and the lsof showed the following:

=======
COMMAND  PID  USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
python3 2568 chris    9u  IPv4 0x8e543c41afe5aa07      0t0  TCP *:1900 (LISTEN)
python3 2568 chris   11u  IPv4 0x8e543c41a3dea627      0t0  TCP 10.1.2.125:1900->10.1.2.125:50338 (ESTABLISHED)
python3 2569 chris    9u  IPv4 0x8e543c41afe5aa07      0t0  TCP *:1900 (LISTEN)
python3 2569 chris   13u  IPv4 0x8e543c41b16f7367      0t0  TCP 10.1.2.125:50338->10.1.2.125:1900 (ESTABLISHED)

Apparently, the code is grabbing the IP from the tunnel interface that the vpn sets up. Here's ifconfig snippet showing the utun2:

utun2: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
    inet 10.34.10.6 --> 10.34.10.5 netmask 0xffffffff

So I can move forward but obviously I'd prefer to have my vpn enabled. ;) Is there a way to set the interface used in Thespian?

kquick commented 4 years ago

That's very interesting.

On startup, Thespian attempts to determine the best address to use to allow communication with possibly remote Thespian Actor System instances. There is a startup process that attempts to determine the set of local addresses, and then find a good candidate public address (basically, which address can reach 8.8.8.8 and use that local public address for the Actors. While there are potential issues with this, it generally has worked pretty well, but it appears that the tunnel interface doesn't allow local communications on the local tunnel address.

I've created an override_addr branch that will let you set a THESPIAN_BASE_IPADDR environment variable before startup to override the local public address used. If you are not communicating externally then it should be fine to set this to 127.0.0.1, although you could also use 10.1.2.125.

Please give this branch a try and let me know if this fixes your issue. If so, I will add documentation for this and try to make a new Thespian release with a day or so.

To use this branch:

$ git clone -b override_addr https://github.com/kquick/thespian thespian_oa
$ export PYTHONPATH=$(pwd)/thespian_oa:$PYTHONPATH
$ export THESPIAN_BASE_IPADDR='127.0.0.1'
$ ... startup thespian or esrally ...
kquick commented 4 years ago

Hi @jchrisweaver , any luck validating the override_attr branch?

kquick commented 4 years ago

Just to followup @jchrisweaver , I've just released version 3.10.0 of Thespian which includes the functionality described above (THESPIAN_BASE_IPADDR).

I'm going to close this issue based on this release, but please feel free to re-open it (or open a new issue) if this has not resolved your issues.