meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
3.37k stars 824 forks source link

[Bug]: Node set name and device info from other node in network #2754

Closed mrekin closed 1 year ago

mrekin commented 1 year ago

Category

Other

Hardware

Heltec V3

Firmware Version

2.2.3.282cc0b

Description

This is very strange story, I don't have full info and can't reproduce it right now. I don't know if that firmare bug or android app.

Short version: my node at one moment get short\long name and device info from other node in network. The mate node get name\device info from my node (and may be gps position was switched

Long story: My node: heltec wsl, latest firmware ([2.2.3.282cc0b]), latest android app (2.2.3), node name: narm_ws_solar Mate node: DIYv1, latest firmware ([2.2.3.282cc0b]), nodename: spap25e Nodes has no direct connection (2 nodes between at the way), both uses fixed gps and wifi connection. At sunday we try to switch from LF preset to LM preset for testing (7 nodes at all) - do some test and return back to LongFast preset at the end of day. At monday I found that my node has name "spap25e" and device info "diy_v1" in android app (in nodes tab and in device list tab). My second node also show incorrect node name for the first node. I open Radio configuration, user info - there was correct name. After changing name in user info to other value and saving - node name was correct in android app for aprx. ~30min and then switched back to "spap25e". I set name again and tried to drop node DB - the situation repeated. I try to set name again and node stops loading. In serial i get INFO | ??:??:?? 1 Loading /prefs/channels.proto INFO | ??:??:?? 1 Loaded saved channelFile version 22 [ 1016][E][vfs_api.cpp:105] open(): /littlefs/oem/oem.proto does not exist, no permits for creation INFO | ??:??:?? 1 No /oem/oem.proto preferences found WARN | ??:??:?? 1 NOTE! Our desired nodenum 0x75cec040 is in use, so trying for 0x4 WARN | ??:??:?? 1 NOTE! Our desired nodenum 0x4 is in use, so trying for 0x4 WARN | ??:??:?? 1 NOTE! Our desired nodenum 0x4 is in use, so trying for 0x4 WARN | ??:??:?? 1 NOTE! Our desired nodenum 0x4 is in use, so trying for 0x4

After this I flash node with full erase (latest fw again) and now all is correct.

Later mate says that his node get the same behavior - his node get my node name and info. All other 5 nodes has no problems, most of them at older [firmwares.]

Here the screen video from my node and some screenshots (url)

https://github.com/meshtastic/firmware/assets/8645868/5dd6a2ff-1635-4708-a208-8ee15fee4b14

Both 'nodes' in a list with the same info (position, battery level), but rssi different - one node is mine, other - remote photo_2023-08-29_09-26-43

Maybe later we'll try to reproduce, but it requires the participation of several people to reconfigure network

Relevant log output

No response

andrekir commented 1 year ago

??:??:?? 1 NOTE! Our desired nodenum 0x75cec040 is in use, so trying for 0x4 WARN
??:??:?? 1 NOTE! Our desired nodenum 0x4 is in use, so trying for 0x4 WARN ??:??:?? 1 NOTE! Our desired nodenum 0x4 is in use, so trying for 0x4 WARN ??:??:?? 1 NOTE! Our desired nodenum 0x4 is in use, so trying for 0x4

you have nodes with the same nodenum, which must be unique. the cause seems to be NodeDB::pickNewNodeNum() always returns 0x4 after a collision is detected.

similar to https://github.com/meshtastic/firmware/issues/2572

mrekin commented 1 year ago

Hi @andrekir Thanks for your reply! Can you explain a little more? Does nodeNum generates based on device macAddr and check with nodeDB on collision? Why nodeDB reset doesn't help? Why problem doesn't occured again / before - both nodes in the same network for a long time.

andrekir commented 1 year ago

Does nodeNum generates based on device macAddr and check with nodeDB on collision?

correct.

reset won't clear the nodenum AFAIK. nodes with the short name on a black background (like in your image) have the 0x4 nodenum issue, may be difficult to notice otherwise.

mrekin commented 1 year ago

nodes with the short name on a black background

Sounds like 'black spot' :) Anyway, my node and mate node not 'black' in normal life. (but black my other node - mesh looks like tihs myProblemNode - myBlackNode - 3rdNode - mateProblemNode

So, I can imagine the strange behavior with names and device info is result of nodenum collision. But if nodenum based only on macaddr - problem must happen every time both nodes in same network (they work both in large mesh >60 nodes). Something causes nodenum change (on my node or mate node, but mate doesn't got error with No /oem/oem.proto preferences found afaik) - memory overlap, fs problem, etc? I think we'll try do test again with preset change.

GUVWAF commented 1 year ago

But if nodenum based only on macaddr - problem must happen every time both nodes in same network

Yes, I also don’t think they actually have the same NodeNum as that is highly unlikely, since it’s a 32-bit number.

We’ve had this report before, but I’ve not seen it after PR #2576 was merged. I’m wondering if it could be caused by the nodes in between that are on older firmware as you mentioned.

Since it seems to occur only very infrequently it is hard to debug, but I hope you can get a way to reproduce.

mrekin commented 1 year ago

Hi @GUVWAF !

I’m wondering if it could be caused by the nodes in between that are on older firmware as you mentioned

I don't think so because nodes between most of the time at lower/older firmware - I has only one connection to network and this node last two-three weeks has older fw.

I hope we will reproduce it, but i don't know how to debug this. Do i need save full console output? What else can help ?

GUVWAF commented 1 year ago

Yes, mainly full console output would help. If you did something specific (e.g. NodeDB reset, requesting position, rebooting, etc.) after which it happened that would be good to know as well.

mverch67 commented 1 year ago

The wrong assignment of nodeIds (mainly due to nodeDB reset) is fixed by #2798.

mrekin commented 1 year ago

I can't reproduce problem anymore (on old firmware and on new) - so I think we can close bug untill it happens again)