wb2osz / direwolf

Dire Wolf is a software "soundcard" AX.25 packet modem/TNC and APRS encoder/decoder. It can be used stand-alone to observe APRS traffic, as a tracker, digipeater, APRStt gateway, or Internet Gateway (IGate). For more information, look at the bottom 1/4 of this page and in https://github.com/wb2osz/direwolf/blob/dev/doc/README.md
GNU General Public License v2.0
1.51k stars 299 forks source link

Stuck on previous digipeater #534

Open jmkristian opened 1 month ago

jmkristian commented 1 month ago

If an attempt to connect via a digipeater fails, a new attempt to connect via a different digipeater uses the previous digipeater (and consequently fails). Direwolf should use the digipeater specified in the new 'v' request.

Two files are attached, copied from the output of Direwolf with the command line option '-d au'. The file via-n0call.txt shows two consecutive connection attempts, which both fail because there is no digipeater with call sign N0CALL. The file via-kjohn.txt shows a successful connection, using the same 'v' request as in via-n0call.txt.

I used Windows 8.1, Direwolf version 1.7 and a DRA-65 audio adapter.

via-n0call.txt via-kjohn.txt

dranch commented 4 weeks ago

Hello John, This does seem like a bug per the via-n0call.txt file but I see you're using an AGW client. Which one and which version? Instead of using the KJOHN alias for the digipeater, can you try using say KF6ANX-4 instead (the callsign instead of an alias) and see if that changes anything?

jmkristian commented 4 weeks ago

I see the same problem when the second connect attempt is via KF6ANX-8. The problem stops when I disconnect the TCP connection from my client to Direwolf and then start a new TCP connection. These behaviors are shown in the attached file via-kf6anx-8.txt.

My AGW client is chatter in agwpe-tools. I'm using a new version, which I'm developing but have not released. I plan to release it soon. I can give you a pre-release copy of chatter.exe for Windows, if you like. The released version of chatter doesn't trigger this problem, presumably because it creates a new TCP connection for each AX.25 connection.

via-kf6anx-8.txt

mfncooper commented 4 weeks ago

I can confirm the same behaviour when using Paracon, my AGWPE packet terminal, with Direwolf.

As an additional experiment, after the failed use of N0CALL, I made a connection attempt using 2 digipeaters, just in case changing the count might nudge Direwolf into the correct behaviour, but the result was the same - Direwolf continued to use N0CALL and only N0CALL.

A very quick look at the code seems to show that the correct list of digipeaters gets at least as far as dl_connect_request, but I have not looked deeper than that.

mfncooper commented 4 weeks ago

There is actually a wider problem here. If an AGWPE client makes a connection from owncall O to peercall P via digipeater D1, successful or otherwise, and then later attempts to make another connection from O to P, but this time via digipeater D2, Direwolf will still try to use D1 to make the connection. In fact, even if the client subsequently attempts to make a connection directly from O to P without specifying a digipeater, Direwolf will still try to use D1. Conversely, if the client attempts to make a connection directly from O to P, and then later attempts to make a connection from O to P using digipeater D1, Direwolf will not use D1. In short, whatever the state of digipeaters is for a first connection between O and P will remain in place for the remainder of the AGWPE client session.

The problem is with the maintenance of list_head, the "list of current state machines for each link", in ax25_link.c. An entry is added to this list by get_link_handle when a new connection is being created by dl_connect_request. The bug is that entries are only ever removed from this list either when an AGWPE client connection error occurs or that client goes away, through a call to dl_client_cleanup.

This means that, when the attempt to connect through an invalid digipeater fails (or in fact when any connection attempt has completed, successfully or otherwise), the corresponding entry remains in the list_head list, and won't be removed until the AGWPE client disconnects from Direwolf.

When dl_connect_request calls get_link_handle to find or create an entry in the list, the digipeaters are not included in the lookup criteria. This is correct, per the AX.25 spec and the semantics of the lookup function. However, it means that when another request is made to connect from the same "owncall" to the same "peercall", the entry for the previous (failed or otherwise) attempt to connect via the previous digipeater is found and reused, because it was never removed from the list.

It would seem that some cleanup of the list_head list needs to happen whenever a connection is "finished with", whether that be successfully closed or never opened due to a failure.