Closed hayden-ernst closed 1 year ago
Hi Hayden (@hayden-ernst ),
Thank you for reporting this. Excellent detective work!
I will investigate and test this tomorrow, but - having read through the code - I do agree with your conclusion and fix. If I understand correctly, checkChecksum
will return the correct value if the backlog is empty. If the backlog contains another message, then, yes, the index is not correct and the wrong checksum is calculated.
Wow. Great find!
Until tomorrow, Paul
Hi Hayden,
I wasn't able to fully replicate the issue you were seeing as I did not try to queue messages, but I was able to see responseDest
and _swarmBacklog
become 'out of sync'. The checksum was indeed being calculated on the wrong data. I've corrected this in v1.1.10.
Thanks again for reporting this.
Very best wishes, Paul
Hi,
I've been digging into checksum errors that we have been getting while attempting to queue messages. This issue was found while receiving unsolicited messages at a fairly high rate of one every 5 seconds and also trying to queue messages . This issue was intermittent so it was difficult to track down. However, the error was caught during debugging:
'checksum' and 'expectedChecksum' were variables printed from the 'checkChecksum()' function in the 'SparkFun_Swarm_Satellite_Arduino_Library.cpp' file. 'checksum' is the value as calculated from the string and 'expectedChecksum' is the value parsed from the string. With this example, we can see that the checksum found on the response is 2C and the expectedChecksum matches this but the checksum does not match.
Looking at a larger number of communications also helps to give us a picture of what may be happening:
Here, we can see that the checksum for the first receive test unsolicited message is calculated correctly. Then, both the checksum and expectedChecksum values do not match up with the checksum from the return after the queue message is sent. The received string shows a checksum of 27 but they both stay as 15. The same is true for the response to the get unsent messages count command. It has a checksum of 08 but again we are stuck at 15. The values do still match, so no error is detected. Then when the backlog is being pruned, we find that the checksums begin to match up with the message strings again.
This looked to me like the string being passed into the 'checkChecksum()' function was not being updated. So I looked into where this was and found it in the function below. If we look at the bottom where the checksum is actually checked, we see that the '_swarmBacklog' buffer is passed into the function at the index 'responseStartedAt'. However, when we look at where that index is found, we see that the 'responseDest' buffer is the one being iterated through to find our response. This means that the index we are finding probably won't match with the location of the response in the _swarmBacklog, especially if we are receiving unsolicited messages which are put in there. It is most likely corresponding to the first message in the _swarmBacklog which in this case is probably a receive test message. This will usually result in the wrong message string being checked for a correct checksum which will normally still find that the checksums match. However, it can cause unexpected results.
As a temporary fix, I changed the calls to the checkChecksum function here to look like this:
err = checkChecksum((char *)&responseDest[responseStartedAt]);;
After implementing this change, the checksums seemed to match up to those on the received messages and after testing this with a large number of messages, we have not seen the error again.Anyway, that's just what I found. Please take a look and see if you guys end up with similar conclusions.
Best, Hayden