meshtastic / python

The Python CLI and API for talking to Meshtastic devices
https://meshtastic.org
339 stars 150 forks source link

Tunnel not working -- bad file descriptor #96

Closed jdstroy closed 2 years ago

jdstroy commented 3 years ago

On Meshtastic 1.2.35 with Python 3.8.8, --tunnel appears to be broken:

macmini:~$ python3 --version
Python 3.8.8
macmini:~$ sudo meshtastic --host 192.168.29.58 --tunnel
Connected to radio
INFO:root:Starting IP to mesh tunnel (you must be root for this *pre-alpha* feature to work).  Mesh members:
INFO:root:Node !xxxxxxxx has IP address 10.115.xx.yy
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
macmini:~$ uname -a
Linux macmini 5.4.109-0-lts #1-Alpine SMP Wed, 31 Mar 2021 10:17:30 UTC i686 Linux
macmini:~$ meshtastic --version
1.2.35

Debugging this problem, it appears that the spawned thread at Tunnel.__tunReader throws an uncaught exception at tap.read() with EBADF:

19362 read(4, 0xb72ae570, 200)          = -1 EBADF (Bad file descriptor)

Searching backwards, file descriptor 4 was opened by the parent process:

19328 open("/dev/net/tun", O_RDWR|O_LARGEFILE|O_CLOEXEC) = 4

It seems tun does not work correctly when its file descriptor is copied from the parent process to the child process.

Patching Meshtastic so that the (Python) thread that initializes TUN is also the same (Python) thread that calls tap.read() seems to work, although doing so makes tunnel.py a bit less useful as library code.

KiwiHC16 commented 2 years ago

@jdstroy could you share the patch for

Patching Meshtastic so that the (Python) thread that initializes TUN is also the same (Python) thread that calls tap.read() seems to work, although doing so makes tunnel.py a bit less useful as library code.
KiwiHC16 commented 2 years ago

In my case:

root@abeilleDev:~# meshtastic --version
1.2.40
jdstroy commented 2 years ago

@KiwiHC16 Looks like you figured it out yourself. It seems you also figured out why I didn't submit a patch to "fix" it by removing the threading. :)

Turns out, my initial hunch was incorrect. I incorrectly assumed that the close() on fd 4 was happening because of file descriptor inheritance wackiness, but that's actually not the case; instead, it's a race condition, and I have a fix for that which I'll send in a PR in short order.