meshtastic / python

The Python CLI and API for talking to Meshtastic devices
https://meshtastic.org
371 stars 157 forks source link

Administrating node via TCP/IP after firmware .53 causes meshtastic script not to return to shell #267

Closed ernax78 closed 2 years ago

ernax78 commented 2 years ago

Administrating node via TCP/IP after firmware .53 causes meshtastic script not to return to shell. After upgrading to the .53 version of firmware (also verified on .54 and now .55) the python "meshtastic" script never returns to shell after doing remote administration of a node.

The problem has also been present in all python releases up until 1.2.83

How to reproduce:

meshtastic --host 172.16.10.22 --info

It will hang and never return to shell.

meshtastic --host 172.16.10.22 --export-config --info
# start of Meshtastic configure yaml
owner: EB1

owner_short: EB1

channel_url: https://www.meshtastic.org/d/#redacted

user_prefs:
  phone_timeout_secs: 900
  ls_secs: 300
  wifi_ssid: "MikroTik-redacted"
  wifi_password: "sekrit"
  region: RU
  location_share: LocEnabled
  gps_operation: GpsOpStationary
  mqtt_server: "10.127.100.110"
  range_test_plugin_enabled: 'true'
  range_test_plugin_save: 'true'
  position_flags: 35
  is_always_powered: 'true'
  auto_screen_carousel_secs: 10
  mqtt_username: "EB1"

This hangs...

Sometimes it works, the --debug looks like this:

owner: EB1

owner_short: EB1

channel_url: https://www.meshtastic.org/d/#redacted

user_prefs:
  phone_timeout_secs: 900
  ls_secs: 300
  wifi_ssid: "MikroTik-redacted"
  wifi_password: "sekrit"
  region: RU
  location_share: LocEnabled
  gps_operation: GpsOpStationary
  mqtt_server: "10.127.100.110"
  range_test_plugin_enabled: 'true'
  range_test_plugin_save: 'true'
  position_flags: 35
  is_always_powered: 'true'
  auto_screen_carousel_secs: 10
  mqtt_username: "EB1"

DEBUG file:tcp_interface.py close line:50 Closing TCP stream
DEBUG file:stream_interface.py close line:114 Closing stream
DEBUG file:stream_interface.py _sendToRadioImpl line:104 Sending: disconnect: true
DEBUG file:stream_interface.py _sendToRadioImpl line:104 Sending: 
DEBUG file:stream_interface.py _sendToRadioImpl line:109 sending header:b'\x94\xc3\x00\x03' b:b'\xc0\x06\x01'
DEBUG file:stream_interface.py _sendToRadioImpl line:109 sending header:b'\x94\xc3\x00\x00' b:b''
DEBUG file:mesh_interface.py _handlePacketFromRadio line:687 Publishing meshtastic.receive.admin: packet={'from': 2475256972, 'to': 2475256972, 'decoded': {'portnum': 'ADMIN_APP', 'payload': b':\x04\x08\x01\x12\x00', 'requestId': 579296371, 'admin': {'getChannelResponse': {'index': 1, 'settings': {}}, 'raw': get_channel_response { index: 1 settings { } } }}, 'id': 728926315, 'rxTime': 1644738337, 'hopLimit': 3, 'priority': 'RELIABLE', 'raw': from: 2475256972 to: 2475256972 decoded { portnum: ADMIN_APP payload: ":\004\010\001\022\000" request_id: 579296371 } id: 728926315 rx_time: 1644738337 hop_limit: 3 priority: RELIABLE , 'fromId': '!93896c8c', 'toId': '!93896c8c'} 
DEBUG file:stream_interface.py __reader line:181 reader is exiting
DEBUG file:__main__.py onReceive line:29 in onReceive() d:{'portnum': 'ADMIN_APP', 'payload': b':\x04\x08\x01\x12\x00', 'requestId': 579296371, 'admin': {'getChannelResponse': {'index': 1, 'settings': {}}, 'raw': get_channel_response {
  index: 1
  settings {
  }
}
}}
DEBUG file:stream_interface.py _disconnected line:78 Closing our port

And it returns back to shell..

But usually it becomes like this:

DEBUG file:stream_interface.py __reader line:129 reading character
DEBUG file:__main__.py onReceive line:29 in onReceive() d:{'portnum': 'ADMIN_APP', 'payload': b':\x04\x08\x01\x12\x00', 'requestId': 1340883869, 'admin': {'getChannelResponse': {'index': 1, 'settings': {}}, 'raw': get_channel_response {
  index: 1
  settings {
  }
}
}}

And it never hangs up.

This goes for both local connections (same ether) or via vpn so I think I can rule out any IDS.

From windows I can't even ctrl-c the meshtastic.exe.

This goes for Eltec, T-Beam and Lora32 with firmwares from .53 so I think it might be some issue in the firmware rather than in Python, but the symptom shows in python...

ernax78 commented 2 years ago

Some serial debugging show following:

When doing no command, just --node IP

Python:
DEBUG file:stream_interface.py _sendToRadioImpl line:109 sending header:b'\x94\xc3\x00\x19' b:b'\x12\x17\x15\xff\xff\xff\xff"\t\x08\x03\x12\x05M\xcf\xc7\x08b5$\xe0anP\x03'

Serial:
[D][WiFiClient.cpp:511] connected(): Disconnected: RES: 0, ERR: 128
08:59:11 442 [StreamAPI] Lost phone connection
08:59:11 442 [StreamAPI] Info: ClosedClient dropped connection, suspending API service

It seems the script just cuts the connection brutally.

However, when issuing a command, like --info or --nodes:

DEBUG file:tcp_interface.py close line:50 Closing TCP stream
DEBUG file:stream_interface.py close line:114 Closing stream
DEBUG file:stream_interface.py _sendToRadioImpl line:104 Sending: disconnect: true
DEBUG file:stream_interface.py _sendToRadioImpl line:109 sending header:b'\x94\xc3\x00\x03' b:b'\xc0\x06\x01'

09:00:37 528 [StreamAPI] Info: got disconnect
09:00:37 528 [StreamAPI] Info: ClosedIgnoring downlink message we originally sent.

Then it hangs until I do ctrl-c on python and I get:

[D][WiFiClient.cpp:511] connected(): Disconnected: RES: 0, ERR: 128
09:01:48 599 [StreamAPI] Client dropped connection, suspending API service

Here it seems like it wants the device to disconnect? I dont know, but somehow it hangs forever... More interesting is if I kill the connection totally (Pull the plug on the Eltec) the python just hangs on...

This is on Debian Linux.

ernax78 commented 2 years ago

After upgrading to python 3.10.2 and some more testing, I think I have found where it "hangs":

stream_interface.py row 120 (wait for it to exit):

    def close(self):
        """Close a connection to the device"""
        logging.debug("Closing stream 1")
        MeshInterface.close(self)
        # pyserial cancel_read doesn't seem to work, therefore we ask the
        # reader thread to close things for us
        self._wantExit = True
        if self._rxThread != threading.current_thread():
            self._rxThread.join()  # wait for it to exit

My ugly fix to get it to work was simply commenting out the last two rows in the function, but that is not the way.

garthvh commented 2 years ago

Try turning the carousel way down or off 10 seconds is too frequent. Are you also connected in a browser to the WebUI when having the issue?

ernax78 commented 2 years ago

Carousel now disabled, but it did not do any change. No browser connected.

garthvh commented 2 years ago

What OS?

garthvh commented 2 years ago

Another similar report Linux

https://github.com/meshtastic/Meshtastic-python/issues/268

May need a hosts file record

ernax78 commented 2 years ago

Linux and Windows 11...


When adding an issue, be sure to include the following info:
 System: Linux
   Platform: Linux-4.15.3-141-generic-x86_64-with-glibc2.24
   Release: 4.15.3-141-generic
   Machine: x86_64
   Encoding (stdin): utf-8
   Encoding (stdout): utf-8
 meshtastic: v1.2.83
 Executable: /home/ern/.local/bin/meshtastic
 Python: 3.10.2 CPython GCC 6.3.0 20170516

AND

When adding an issue, be sure to include the following info:
 System: Windows
   Platform: Windows-10-10.0.22000-SP0
   Release: 10
   Machine: AMD64
   Encoding (stdin): utf-8
   Encoding (stdout): utf-8
 meshtastic: v1.2.82
 Executable: C:\Users\ern\AppData\Roaming\Python\Python39\Scripts\meshtastic
 Python: 3.9.2 CPython MSC v.1928 64 bit (AMD64)

Please add the output from the command: meshtastic --info

Same issue on both.

Also, I use IP addresses, not hostnames when I speak to my devices. They are all different models on different networks. I was first suspecting my IPS, but it also occurs when I am on the same ether without IPS.

Also hangs on MacOS X

dfloer commented 2 years ago

I can confirm that I'm having similar issues on both Linux (Arch, fully up-to-date as of posting) and Windows 10 (19044.1503). Both using Python 3.10.2, meshtastic 1.2.83, with1.2.55 on a v2.1 Heltec connected to WiFi with no routers or any packet filtering between it and my test machines. I tried both IP address and hostname.

Command run was: meshtastic --host {host|IP} --info, and it successfully completed.

I am not having the issue happen consistently. When I was first testing, it was happening every time I tried, but as I poke at it more, it stopped happening. Thinking I may have changed some code, I created a new pipenv to make sure that I was still seeing the hang, and I was only able to make it happen 4/36 attempts.

Maybe this is a race condition, but related to something with the device, rather than the library/host directly? While I didn't change anything on the node, it is part of a mesh, so there is data flowing through it, and it should be logging with the rangetest plugin as well.

mkinney commented 2 years ago

Good news! I think I found the problem. I have a fix building now.

See https://github.com/meshtastic/Meshtastic-device/pull/1266

mkinney commented 2 years ago

Please try https://github.com/meshtastic/Meshtastic-device/releases/tag/v1.2.57.f7c6955

mkinney commented 2 years ago

Per Discord message, "seems to work" - "the hang with Python".