Closed ianmcorvidae closed 1 week ago
I'm not entirely sure why it needs to be added to the RouteDiscovery
if you can already deduce it from the information in the MeshPacket (hopStart
- hopLimit
)?
Not back at node A -- this is proposing sending back the data derived from hopStart
/ hopLimit
of the outbound traceroute, not of the reply. The traceroute from A to E may have one missing link, but the packet getting back from E to A after the traceroute has completed could go a totally different way (and often will, from what I've seen), so the hopStart
and hopLimit
of the packet when it gets back to A aren't meaningful in interpreting the path
I see, makes sense.
So there's already a ticket on the 3.0 board to also add the route back to the traceroute, and the SNR per link. I first wanted it to do for 3.0 only, because it would give wrong information if not all nodes update. However, since we now can derive the actual hops it took, we could at least show the number of unknown hops on the way back. In this way we can gradually introduce it (and maybe don't show it yet in the phone clients). Downside is that it increases the packet size, but currently a traceroute is pretty small and I think SNR can be encoded with 1 byte. What do you think?
That seems neat too, though I wasn't even thinking of reverse traceroutes -- just detecting missing hops on the current single-direction ones (the reverse direction being mentioned in my prior response because, since it can take a different route, it can't be used for calculating missing hops in the original traceroute)
It also occurs to me that my original request might be better served by each node along the way calculating missing hops, since it would allow more information about where exactly the missing hops are.
I might try to draw up a diagram with the various pieces of information being proposed in different places to clarify what exactly we're all talking about :sweat_smile:
First, how traces currently work:
A does not know (and can't know, with the information it has on the response) about the missing hop (C) between B & D
As proposed here:
Here, node E detects the missing hop and adds it to the RouteDiscovery, so A knows that there was a missing hop somewhere on the traceroute.
An even better version, maybe:
Here, node D (the first node after the missing hop) detects the missing hop and notes it somewhere associated with itself in the RouteDiscovery. Node E doesn't need to add anything, because all hops are accounted for, and node A knows that there is a missing hop and that it's somewhere before node D (with all updated firmware, immediately before D).
If the SNR per link is added, these missing hops would presumably be added in a similar place in this last option, since they're per-hop information. Another thing that could be added per-hop is whether a given hop was via MQTT or not, so traceroutes could identify MQTT gateway nodes.
Hopefully those diagrams are helpful at all, they're pretty slapped-together obviously. The core observation is that nodes on the outbound path are able to detect if the trace went by way of any nodes that couldn't add themselves to the list, but the originating node can't know that after the traceroute returns unless the outbound nodes or traceroute destination add it to the payload in some way, particularly since the response might go via a different path. SNR and MQTT are similarly things that can't be determined by the originating node unless they're added by the outbound-path nodes.
Bidirectional traceroutes would be another addition to this, so that nodes F and G are also eventually included in the list, but that's likely a bigger change.
The other complication is whether to add a "total hops" indicator or just a "missing hops" one.
If it's "total hops", then the field would always be present with updated firmware, which would make it easy to tell if nodes D or E are adding information or not, in the proposed changes. But it'd increase packet size. Having only a "missing hops" field would mean that each missing hop would only need to be accounted for once (by D or E, in the example), but if a node doesn't have updated firmware and doesn't add the indication, it would appear that there were no missing hops, even if there actually were any.
My feeling is that adding only missing-hops information is better since nodes tend to eventually update and folks are used to (or at least somewhat used to) the current behavior where those hops are totally absent already anyway.
Adding SNR for each hop would greatly increase the ability to troubleshoot unstable connections.
@ianmcorvidae Thanks for the detailed explanation. So what if a node that detects there is a missing hop adds a dummy node number 0xFFFFFF to the route? Then we don't need to add a new field and it might be more efficient (depends a bit on the overhead of the repeated
protobuf field versus a new field).
I think this will immediately work for apps as well, because they don't know a node with node number 0xFFFFFF.
That does seem like a good way to do it, and it's easily detectable by clients that want to mark it as a missing hop (vs. a known hop from an unknown node). I guess there remains the potential for the missing hops to get added the wrong place (in my example, if D has too-old firmware, E would add the missing hop after D rather than before), but that's something that would correct itself over time and it's still better information than we currently have!
The other improvements like bidirectional traceroutes and SNR would still be good of course, but I like that idea for covering just the original topic of this ticket.
Implemented in the firmware by #4056.
Platform
NRF52, ESP32, RP2040, Linux Native
Description
(this would presumably also require a corresponding protobuf change)
Right now, it's possible to get a somewhat misleading traceroute result if the outbound traceroute goes via a node that doesn't, or can't, decode the message. Since that node (or nodes) can't decode the message, it also can't add itself to the
RouteDiscovery
, so its hop is excluded from the final result.While we can't know the identity of these nodes, with the existence of
hopStart
we could be aware that some number of hops are not represented in the returned value and let the requesting client know.For a path like this:
the traceroute would show
B
andD
only, but if the hop limit was 3, thenE
will see the message withhopStart
of 3 andhopLimit
of 0 and could see that there's one missing hop in the traceroute.I'd propose we add either the count of total hops (in the example, 3) or the count of missing hops (in the example, 1) to the
RouteDiscovery
. Adding the missing hops would probably usually be 0, keeping the size of the packet a little smaller, while adding the total hops would mean the field is present in any traceroute that isn't to a direct neighbor, but would allow distinguishing between a traceroute where E is on older firmware that excludes the new field from a traceroute where E is actually aware of all hops.EDIT: After some discussion below, it seems like a good option here might be to have firmware detect missing hops via this method at each hop, and if any node detects a missing link, add a dummy hop to the existing structure with a nodeid of 0xFFFFFFFF. Existing clients will simply see it as an unknown nodes, but anyone who wants to can understand that value to mean a node detected missing hops instead.