markqvist / Reticulum

The cryptography-based networking stack for building unstoppable networks with LoRa, Packet Radio, WiFi and everything in between.
https://reticulum.network
MIT License
2k stars 124 forks source link

RNodeConf doesn't disable Bluetooth on command / Windows DTR causes timing/configuration issues #498

Closed faragher closed 4 months ago

faragher commented 5 months ago

Describe the Bug When using RNodeConf to disable Bluetooth, the configuration is not set. The board still responds to Bluetooth.

To Reproduce Run rnodeconf (port) -B and attempt to attach via Bluetooth. Restarting board optional.

Expected Behavior RNode would no longer connect via Bluetooth

Logs & Screenshots Linux:

[00:58:47] Opening serial port /dev/ttyACM0...
[00:58:51] Device connected
[00:58:51] Current firmware version: 1.70
[00:58:51] Reading EEPROM...
[00:58:51] EEPROM checksum correct
[00:58:52] Device signature validated
[00:58:52] Disabling Bluetooth...

Windows:

[18:43:22] Opening serial port com15...
[18:43:26] Device connected
[18:43:26] Current firmware version: 1.59
[18:43:26] Reading EEPROM...
[18:43:26] EEPROM checksum correct
[18:43:26] Device signature validation failed

     WARNING! This device is NOT verifiable and should NOT be trusted.
     Someone could have added privacy-breaking or malicious code to it.

     Please verify the signing key is present on this machine.
     Autogenerated keys will not match another machine's signature.

     Proceed at your own risk and responsibility! If you created this
     device yourself, please read the documentation on how to sign your
     device to avoid this warning.

     Always use a firmware downloaded as binaries or compiled from source
     from one of the following locations:

        https://unsigned.io/rnode
        https://github.com/markqvist/rnode_firmware

     You can reflash and bootstrap this device to a verifiable state
     by using this utility. It is recommended to do so NOW!

     To initialise this device to a verifiable state, please run:

              rnodeconf com15 --autoinstall

[18:43:26] Disabling Bluetooth...

System Information See above RNodeConf 2.1.3

Additional context I believe this is either an RNodeConf issue or a long-standing firmware problem based on the versions presenting this issue. FW Version 1.70 and 1.59 tested on both Linux and Windows.

faragher commented 5 months ago

Okay, Linux appears to work as intended (issue appears related to #497 on Linux) but Windows is a legitimate problem. I cannot disable or enable Bluetooth. In addition, attempting to enable Bluetooth (from the disabled state at least) either resets the display or reboots the MCU. Since this appears to be a Windows issue, I'll see what I can do.

faragher commented 5 months ago

It's intermittent. Sometimes it works, sometimes it doesn't, sometimes it does a power cycle. Since I can't monitor the serial port and call commands at the same time, I'll have to write a test program. It won't be soon. I'll let you know.

faragher commented 5 months ago

Okay, so I lied. I stole a bunch of code from RNodeConf and got a test system running. Pushing the bytes across the com port works perfectly. Reboot occurs after Opening serial port but before Device connected

Not much news, but ruled out RNode and serial issues.

faragher commented 5 months ago

I have found a fundamental difference between Linux and Windows behavior that will need to be addressed and causes the reset. Windows doesn't disconnect cleanly somehow. This can be seen on the RNode itself, as on rnodeconf's termination, the connection icon does not revert like it does on Linux. On a fresh boot, the RNode functions as intended.

At this time it appears that this occurs when data is read from the serial port. This needs significantly more research, but I've narrowed it down to a python serial difference under Windows, and not something fundamentally RNode/Reticulum related.

No idea if this has anything to do with Bluetooth yet.

faragher commented 5 months ago

Yes. It has something to do with Bluetooth, I'm sure. It actually locks up the RNode at some point, because the reset button doesn't work. I'm out of energy for the day.

markqvist commented 5 months ago

Thanks for investigating this! If it is indeed related to specific behaviour on Windows, and you can figure out a root cause, we can most likely implement necessary workarounds easy enough.

faragher commented 5 months ago

I am convinced the issue is that something locks up the RNode and prevents the command from even being processed (not sent, or RNodeConf would detect the failure). I need to check function by function, since a raw serial connection or -P doesn't trigger it, but -i or -b/B does.

The rebooting issue turns out to be entirely unrelated, but a Windows specific issue. By default, connecting to an Arduino reboots the MCU, and this legacy behavior exists on the ESP32, apparently. It can be disabled, on paper, with dsrdtr = False in the serial command, as RNodeConf does. Windows does not respect this setting. SerialObject.setDTR(False) seems to work on Windows if called before open(). I am uncertain if there's a negative effect on Linux, and will not make the change in the code even for testing until the base problem is resolved.

faragher commented 5 months ago

Okay, important findings.

Enabling TNC was a great idea, although I can't disable it. This allows me real-time identification of when the MCU locks up, as the waterfall functions until then. I have a pretty good sample of things that cause the issue and can hopefully use timing and the -e and -P commands to whitelist functions.

The absolutely random retention of BT and display intensity settings makes me believe the failure is inside the EEPROM routines, but I won't assume at this time. Especially since that doesn't make a lick of sense if it works on Linux, and it sure does.

I have some more tests to do.

By call:

Command Locks up? Notes
-h No No RNode access, shouldn't matter
-i Yes
-e No
-N Yes Cannot return to Normal mode
-T Yes Good until "Radio Reporting Frequency" - Enables TNC
-b/-B Yes Sometimes maybe works?
-p Yes Took me like 20 tries to pair the device. I assume it's part of the issue
-D Yes Display intensity only occasionally saved
-P No
faragher commented 5 months ago

Okay, this is a good news bad news situation. First, this is a Windows-specific (and possibly driver specific!) issue. It can be treated like a timing issue, and I have, in the interim, "solved" it by adding a sleep(1) after the serial commands in the RNode.leave() function.

Core problem: Windows. No, seriously. Remember how I said Windows ignores DTR (Data Terminal Ready)? It's not entirely true. Windows IMMEDIATELY pulls DTR low when the serial port is closed. This causes the MCU to halt. Linux does not.

The implication here is that on Linux the MCU finishes its work after the serial connection is broken. On Windows, it always resets on connection (edge case where DTR is already pulled high, such as first connection) and it always halts when disconnected. Meaning unless there's an active connection or the port has never been closed, an RNode will not work on Windows, since a closed connection will halt the system.

In Reticulum use this doesn't matter; the serial connection is active when attached to the RNS. In RNodeConf use, it causes timing issues. If someone is setting up a portable RNode and thinks it'll work for powering the node, it will fail inexplicably. Even with the modifications I've done, once you enable bluetooth, it will halt on disconnect and need to be unplugged and plugged back in to function without an active serial connection.

I don't know if this is all Windows or just specific driver batches, but it also can be terminal specific, as some terminals (like PuTTY) have been reported to show this behavior, but that could still be OS or driver related.

@markqvist Please advise where you'd like the timing issue addressed. Leave seems appropriate enough, but we could also replace all quit() commands with a graceful quit function that also checks for stray RNodes and serial connections, then does all the timing and cleanup work in one place. You know the code far better than I.

markqvist commented 4 months ago

Phew, thanks for hunting this one down. I'm relieved it's solvable with a bit of timing modification.

Core problem: Windows. No, seriously. Remember how I said Windows ignores DTR (Data Terminal Ready)? It's not entirely true. Windows IMMEDIATELY pulls DTR low when the serial port is closed. This causes the MCU to halt. Linux does not.

Ah, that makes a lot of sense then. We can use RNS.vendor.platformutils.is_windows() to check whether RNodeConf is running on Windows, and add specific code if that is the case.

Please advise where you'd like the timing issue addressed.

The most reliable thing would probably be to have a central quit function that gracefully cleans everything up, but if simply adding a conditional sleep to leave on Windows solves it all, that's probably just as valid a solution. I'm happy with both.

markqvist commented 4 months ago

Closing this as fixed in #500