Closed geeksville closed 2 years ago
This issue has been mentioned on Meshtastic. There might be relevant details there:
https://meshtastic.discourse.group/t/lost-messages-while-testing/2455/3
hmm - stress test shows no dropped packets at the radio or python api level:
~/development/meshtastic/Meshtastic-python$ bin/run.sh --test
INFO:root:Writing serial debugging to log_dev_ttyUSB0
Connection changed: meshtastic.connection.established
INFO:root:Writing serial debugging to log_dev_ttyUSB1
Connection changed: meshtastic.connection.established
INFO:root:Ports opened, starting test
INFO:root:Found devices, starting tests...
INFO:root:Sending test packet from 2885173400 to 4294967295
From /dev/ttyUSB1: {'from': 681021528, 'to': 4294967295, 'decoded': {'data': {'portnum': 'POSITION_APP', 'payload': b'', 'position': {}}, 'wantResponse': True}, 'id': 1601631314, 'hopLimit': 3, 'fromId': '!246f28979058', 'toId': '^all'}
From /dev/ttyUSB1: {'from': 681021528, 'to': 681021528, 'decoded': {'data': {'portnum': 'POSITION_APP', 'payload': b'', 'position': {}}}, 'id': 1601631315, 'hopLimit': 3, 'fromId': '!246f28979058', 'toId': '!246f28979058'}
From /dev/ttyUSB1: {'from': 681021528, 'to': 681021528, 'decoded': {'data': {'portnum': 'POSITION_APP', 'payload': b'', 'position': {}}}, 'id': 1601631316, 'hopLimit': 3, 'fromId': '!246f28979058', 'toId': '!246f28979058'}
From /dev/ttyUSB1: {'from': 2885173400, 'to': 681021528, 'decoded': {'data': {'portnum': 'POSITION_APP', 'payload': b' dM\xe0\x8b#`', 'position': {'batteryLevel': 100, 'time': 1612942304}}}, 'id': 153691730, 'rxSnr': 9.5, 'hopLimit': 3, 'fromId': '!2462abf84098', 'toId': '!246f28979058'}
From /dev/ttyUSB1: {'from': 681021528, 'to': 4294967295, 'decoded': {'data': {'portnum': 'NODEINFO_APP', 'payload': b'\n\r!246f28979058\x12\x0cUnknown 9058\x1a\x03?58"\x06$o(\x97\x90X', 'user': {'id': '!246f28979058', 'longName': 'Unknown 9058', 'shortName': '?58', 'macaddr': 'JG8ol5BY'}}, 'wantResponse': True}, 'id': 1601631317, 'rxTime': 1612942314, 'hopLimit': 3, 'fromId': '!246f28979058', 'toId': '^all'}
From /dev/ttyUSB1: {'from': 681021528, 'to': 681021528, 'decoded': {'data': {'portnum': 'NODEINFO_APP', 'payload': b'\n\r!246f28979058\x12\x0cUnknown 9058\x1a\x03?58"\x06$o(\x97\x90X', 'user': {'id': '!246f28979058', 'longName': 'Unknown 9058', 'shortName': '?58', 'macaddr': 'JG8ol5BY'}}}, 'id': 1601631318, 'rxTime': 1612942314, 'hopLimit': 3, 'fromId': '!246f28979058', 'toId': '!246f28979058'}
From /dev/ttyUSB1: {'from': 681021528, 'to': 681021528, 'decoded': {'data': {'portnum': 'NODEINFO_APP', 'payload': b'\n\r!246f28979058\x12\x0cUnknown 9058\x1a\x03?58"\x06$o(\x97\x90X', 'user': {'id': '!246f28979058', 'longName': 'Unknown 9058', 'shortName': '?58', 'macaddr': 'JG8ol5BY'}}}, 'id': 1601631319, 'rxTime': 1612942314, 'hopLimit': 3, 'fromId': '!246f28979058', 'toId': '!246f28979058'}
Ignoring sending interface
From /dev/ttyUSB1: {'from': 2885173400, 'to': 681021528, 'decoded': {'data': {'portnum': 'NODEINFO_APP', 'payload': b'\n\r!2462abf84098\x12\x07Node 98\x1a\x02N9"\x06$b\xab\xf8@\x98', 'user': {'id': '!2462abf84098', 'longName': 'Node 98', 'shortName': 'N9', 'macaddr': 'JGKr+ECY'}}}, 'id': 153691731, 'rxSnr': 9.0, 'rxTime': 1612942330, 'hopLimit': 3, 'fromId': '!2462abf84098', 'toId': '!246f28979058'}
From /dev/ttyUSB1: {'from': 2885173400, 'to': 4294967295, 'decoded': {'data': {'portnum': 'TEXT_MESSAGE_APP', 'payload': b'Test 1', 'text': 'Test 1'}}, 'id': 2650926549, 'rxSnr': 10.0, 'rxTime': 1612942340, 'fromId': '!2462abf84098', 'toId': '^all'}
INFO:root:Test succeeded (1 successes so far)
INFO:root:Sending test packet from 2885173400 to 4294967295
Ignoring sending interface
From /dev/ttyUSB1: {'from': 2885173400, 'to': 4294967295, 'decoded': {'data': {'portnum': 'TEXT_MESSAGE_APP', 'payload': b'Test 2', 'text': 'Test 2'}}, 'id': 2650926550, 'rxSnr': 9.0, 'rxTime': 1612942345, 'fromId': '!2462abf84098', 'toId': '^all'}
INFO:root:Test succeeded (2 successes so far)
INFO:root:Sending test packet from 2885173400 to 4294967295
Ignoring sending interface
From /dev/ttyUSB1: {'from': 2885173400, 'to': 4294967295, 'decoded': {'data': {'portnum': 'TEXT_MESSAGE_APP', 'payload': b'Test 3', 'text': 'Test 3'}}, 'id': 2650926551, 'rxSnr': 9.25, 'rxTime': 1612942350, 'fromId': '!2462abf84098', 'toId': '^all'}
INFO:root:Test succeeded (3 successes so far)
INFO:root:Sending test packet from 2885173400 to 4294967295
Ignoring sending interface
From /dev/ttyUSB1: {'from': 2885173400, 'to': 4294967295, 'decoded': {'data': {'portnum': 'TEXT_MESSAGE_APP', 'payload': b'Test 4', 'text': 'Test 4'}}, 'id': 2650926552, 'rxSnr': 9.25, 'rxTime': 1612942356, 'fromId': '!2462abf84098', 'toId': '^all'}
INFO:root:Test succeeded (4 successes so far)
INFO:root:Sending test packet from 2885173400 to 4294967295
Ignoring sending interface
From /dev/ttyUSB1: {'from': 2885173400, 'to': 4294967295, 'decoded': {'data': {'portnum': 'TEXT_MESSAGE_APP', 'payload': b'Test 5', 'text': 'Test 5'}}, 'id': 2650926553, 'rxSnr': 8.75, 'rxTime': 1612942362, 'fromId': '!2462abf84098', 'toId': '^all'}
INFO:root:Test succeeded (5 successes so far)
INFO:root:Sending test packet from 2885173400 to 4294967295
Ignoring sending interface
From /dev/ttyUSB1: {'from': 2885173400, 'to': 4294967295, 'decoded': {'data': {'portnum': 'TEXT_MESSAGE_APP', 'payload': b'Test 6', 'text': 'Test 6'}}, 'id': 2650926554, 'rxSnr': 10.25, 'rxTime': 1612942368, 'fromId': '!2462abf84098', 'toId': '^all'}
INFO:root:Test succeeded (6 successes so far)
INFO:root:Sending test packet from 2885173400 to 4294967295
Ignoring sending interface
...
based on report from @vfurman-gh changed test config to the following for both nodes:
bin/run.sh --port /dev/ttyUSB1 --set position_broadcast_secs 30 --set ls_secs 15
Still no failures with "meshtastic --test"
@vfurman-gh - how many nodes in your mesh? I'll test with android next (using your tool)
@vfurman-gh ooh thanks! that helped a lot - can repro easily with your android based test
ooh I see! yes - with 30 secs between each ping and those position broadcasts I think the acks are getting starved from delivery to the API client device. I'll change the python test to also test this case and then debug/fix the underling device bug.
Also interesting "--set position_broadcast_secs 0" (using the default interval) causes the problem to not be visible. Though I can see that the retransmission delays for reliable messages are way too long...
Also: I bet the fix will involve changing the outgoing message queue in the router to be a priority queue, and add 'priority' to messages. With ack/naks having the highest possible priority and background position broadcasts the lowest. Also the position broadcaster should discard any old (unsent) positions when it tries to queue a new position.
@vfurman-gh thanks for this report. Now fixed I think. Can you confirm when you get the release I'm doing later today?
This issue has been mentioned on Meshtastic. There might be relevant details there:
Trying it now with 1.1.46. Unfortunately, doesn't seem the issue is solved...
On Thu, Feb 11, 2021 at 5:14 PM Kevin Hester notifications@github.com wrote:
@vfurman-gh https://github.com/vfurman-gh thanks for this report. Now fixed I think. Can you confirm when you get the release I'm doing later today?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meshtastic/Meshtastic-device/issues/681#issuecomment-777902663, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASTB466UKZN6EM4BKEPB4I3S6R6I7ANCNFSM4XKUDZIA .
BTW, I use the slowest channel settings.
On Thu, Feb 11, 2021 at 7:02 PM Vadim Furman vfurman@gmail.com wrote:
Trying it now with 1.1.46. Unfortunately, doesn't seem the issue is solved...
On Thu, Feb 11, 2021 at 5:14 PM Kevin Hester notifications@github.com wrote:
@vfurman-gh https://github.com/vfurman-gh thanks for this report. Now fixed I think. Can you confirm when you get the release I'm doing later today?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meshtastic/Meshtastic-device/issues/681#issuecomment-777902663, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASTB466UKZN6EM4BKEPB4I3S6R6I7ANCNFSM4XKUDZIA .
And here are the logs from both devices, in case you need them
On Thu, Feb 11, 2021 at 7:04 PM Vadim Furman vfurman@gmail.com wrote:
BTW, I use the slowest channel settings.
On Thu, Feb 11, 2021 at 7:02 PM Vadim Furman vfurman@gmail.com wrote:
Trying it now with 1.1.46. Unfortunately, doesn't seem the issue is solved...
On Thu, Feb 11, 2021 at 5:14 PM Kevin Hester notifications@github.com wrote:
@vfurman-gh https://github.com/vfurman-gh thanks for this report. Now fixed I think. Can you confirm when you get the release I'm doing later today?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meshtastic/Meshtastic-device/issues/681#issuecomment-777902663, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASTB466UKZN6EM4BKEPB4I3S6R6I7ANCNFSM4XKUDZIA .
@vfurman-gh interesting. how many nodes in your mesh? All with the "--set position_broadcast_secs 30 --set ls_secs 15" setting?
I just did a long test with your ping client and had 100% delivery. Though I was only using a two node mesh, prior to this change I would have had many non acked packets.
I have two nodes.
On Thu, Feb 11, 2021, 10:04 PM Kevin Hester notifications@github.com wrote:
@vfurman-gh https://github.com/vfurman-gh interesting. how many nodes in your mesh? All with the "--set position_broadcast_secs 30 --set ls_secs 15" setting?
I just did a long test with your ping client and had 100% delivery. Though I was only using a two node mesh, prior to this change I would have had many non acked packets.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meshtastic/Meshtastic-device/issues/681#issuecomment-777996134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASTB466TCA2TOKUJ6SLFTH3S6TAHVANCNFSM4XKUDZIA .
And yes, same setting for all nodes. Anything to glean from the device logs?
On Thu, Feb 11, 2021, 10:05 PM Vadim Furman vfurman@gmail.com wrote:
I have two nodes.
On Thu, Feb 11, 2021, 10:04 PM Kevin Hester notifications@github.com wrote:
@vfurman-gh https://github.com/vfurman-gh interesting. how many nodes in your mesh? All with the "--set position_broadcast_secs 30 --set ls_secs 15" setting?
I just did a long test with your ping client and had 100% delivery. Though I was only using a two node mesh, prior to this change I would have had many non acked packets.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meshtastic/Meshtastic-device/issues/681#issuecomment-777996134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASTB466TCA2TOKUJ6SLFTH3S6TAHVANCNFSM4XKUDZIA .
oops - I think the logs are missing (did you intend to attach to the bug)?
one more question: what model of radios?
Both radios are LILYGO® TTGO T-Beam V1.1 SX1262. Just to make sure, you are firing tests from both phones? Looks like github e-mail interface doesn't pass attachments through. Attaching them from web interface: log2.txt log1.txt
Do we do ALOHA for access control? We are probably operating at around 60% utilization with this test, I would not be surprised if aloha starts to fall apart...
oh that helps! I bet running the ping on each end will help me stress things even further and see the same problem. In the next couple of days I'll update the python test so it can do this (so I can then make it nice and repeatable as an integration test).
re: aloha no - the implementation is basically just naive flood routing with (optional) acks and DSR (not used for broadcasts). Though I did use aloha a long time ago when I developed a (pre wifi) wireless protocol at Apple.
Closing this. Should no longer be an issue with the channel tx utilization based throttling.
new behavior since 1.0.
pretty high drop rate (about 30%?) - usually hidden by the retry mechanism.