Closed adamdewey closed 8 months ago
Likely example on a client:
Client reported bug; https://support.msupply.org.nz/cerb6/profiles/ticket/DB-4271-BZ
So after investigating:
I think we would need to add extra logging to the App to zero in on the issue, the more data the better, basically all data for each ble operation
What follows is my understanding of the issue. As this PR stands, it provides a fix for the Blue Maestro sensor, the BT510 requires further changes (noted below).
I'm using the current state
to represent what is currently on the main branch and proposed state
as to what is in the PR.
Based on the current state, when logs are downloaded from a sensor, the connectAndDiscoverServices method (from the msupply-ble-service) is called first. This method cancels the current connection if one exists and then reconnects. The connectAndDiscoverServices method is called when downloading logs and when retrieving the battery level. As these two operations happen concurrently, it is possible for them to overlap. This is most likely to occur when the app first starts logging. There exists the selector isSensorDownloading which is used to prevent the battery being updated if a log download is occurring. Previously, there was no check for the opposite case, where the battery is updating and logging begins. The isSensorUpdating has been added and is used to prevent a download from occurring.
@andreievg rightly asked "Wouldn't app try to re-sync the logs in the next schedule ?"
.
Blue Maestro The following shows the logs relating to a Blue Maestro sensor.
Red - write and monitor error Orange - the app determines based on last recorded time that there are two logs to be saved Purple - due to the error, 0 logs are downloaded Green - battery reading log (note the odd placement) Pink - the interval is set, clearing the logs
The pink operation of setting the interval to clear the logs should not have happened as no logs had been downloaded. But as it did happen, this indicates a bug.
Considering the code here: https://github.com/openmsupply/msupply-cold-chain/blob/0e351a7cb9422ce8c976a8facbcd2642860c184d/src/features/Bluetooth/Download/DownloadSlice.ts#LL158C5-L158C5
This code clears the interval based on numberOfLogsToSave
, which may be a non-zero number even if the download failed.
So to answer @andreievg's question, due to checking numberOfLogsToSave
we set the interval even when downloading has failed and as such, there are no logs to re-sync on the next schedule.
The PR proposes using sensorLogs.length
instead. This shall ensure that the interval is set (and logs cleared) only if logs have been downloaded and saved successfully.
BT510
A fix for the BT510 is more complicated due to the way the logs are downloaded in batches. The downloading "prepares" a batch of logs, downloads and then acknowledges. Once acknowledged, the logs are removed from the sensor.
The current state builds up an array as the logs are downloaded and once all are downloaded, saves to the database.
The code relating to this is from the msupply-ble-service
project:
https://github.com/openmsupply/msupply-ble-service/blob/0fe001f66f669d0e9fd4d59f32a54bf238cab980/src/Bluetooth/BleService.ts#L346
The following shows an image a BT510 failure, note that there is partial data present but it is out of sync with the other sensors. This indicates that there were batches of logs that were successfully downloaded, but some also failed.
A robust fix here would be to store the logs after each download, prior to acknowledgement. This would ensure that if a download fails, or if the app is closed, the logs would remain in the sensor and could be re-read on the next schedule.
Please let me know your thoughts on the above. If it makes sense, I can look to add a BT510 fix to the PR as well.
Brilliant, nicely laid out @camroots.
The BT510 is proving less reliable than we were hoping (I've had a few units inexplicably die on me) and they don't meet the WHO's minimum requirements for PQS.
So recommend we don't spend time trying to fix those as they seem to be a dead end.
@camroots, that's great, it sounds like you've identified a very possible issue of log download failure, I think the fix in PR is probably good overall, but I think we should be dealing with the core issue. When reading tryDownloadForSensor
, it's really hard to know where error exists are, I kind of assume that try
and catch
would capture all of the errors but it looks like they are not:
I think the core issue is with error propagation added here: https://github.com/openmsupply/msupply-cold-chain/pull/222, I think it's mean to have throw
The PR proposes using sensorLogs.length instead.
Do you mean you proposing it in this issue, sorry I didn't quite see it in PR. I agree this is a good solution, but I think the deeper issue with capturing, logging but ignoring an error. So I think a throw statement should be added in the above code snippet an I think we should go through the PR above and make sure we are propagating errors up the stack.
I think we should also consider numberOfLogsToSave, it should also be non zero
Happy to say this has been resolved 🥳
Describe the bug
Have noticed that we get pretty frequent gaps in our temperature logging - a serious issue as users need to be able to trust that they can look up reliable historic temperature data when needed.
I think this is sensor agnostic, happens when using the Basic Blue Maestro, the Maxi Blue Maestro and the Laird BT510.
Easiest to spot on the dashboard:
The logs do not exist either in the cold chain app or mSupply desktop.
Suspicion is that the logs are sometimes being deleted from the sensors before being saved?
To Reproduce Steps to reproduce the behavior:
Expected behavior Temperature logs should never be missing
Smartphone (please complete the following information):
Additional context Add any other context about the problem here.