Gaps in temperature logs

adamdewey commented 1 year ago

Describe the bug

Have noticed that we get pretty frequent gaps in our temperature logging - a serious issue as users need to be able to trust that they can look up reliable historic temperature data when needed.

I think this is sensor agnostic, happens when using the Basic Blue Maestro, the Maxi Blue Maestro and the Laird BT510.

Easiest to spot on the dashboard:

The logs do not exist either in the cold chain app or mSupply desktop.

Suspicion is that the logs are sometimes being deleted from the sensors before being saved?

To Reproduce Steps to reproduce the behavior:

Set up a cold chain standalone app that syncs data to a central server
Add 6 sensors logging at 1 min intervals
Leave for a few days (managed to see it after 24 hours in one case, other times it may take longer)
Trawl through dashboards to spot long flat spots where logs are missing

Expected behavior Temperature logs should never be missing

Smartphone (please complete the following information):

Device: Samsung Galaxy Tab A8
OS: Android 12
Version 0.5.0
Sensors: BM Basic, BM Maxi, Laird BT510

Additional context Add any other context about the problem here.

adamdewey commented 1 year ago

Likely example on a client:

adamdewey commented 1 year ago

Client reported bug; https://support.msupply.org.nz/cerb6/profiles/ticket/DB-4271-BZ

anildahalsussol commented 1 year ago

Another bug: https://support.msupply.org.nz/cerb6/profiles/ticket/XV-2913-GZ/conversation

anildahalsussol commented 1 year ago

So after investigating:

Tablet was disconnected for a few days from the sensor and after trying to connect it was not able to connect.
tried removing and connecting which works but was unable to fetch any logs so wondered are there any logs in the sensor.
To make sure downloaded blue maestro apps and it shows 6000 logs I downloaded and it seems to be no data loss at all.
And thought about whether it will get reset after download but it still remains in that sensor which is good.
So again switched to msupply cold chain apps and after connecting able to get only single logs of recent.
So I think we have records in the sensor but we need to have a feature like a download as blue maestro to fetch the existing data.

andreievg commented 1 year ago

I think we would need to add extra logging to the App to zero in on the issue, the more data the better, basically all data for each ble operation

camroots commented 1 year ago

What follows is my understanding of the issue. As this PR stands, it provides a fix for the Blue Maestro sensor, the BT510 requires further changes (noted below).

I'm using the current state to represent what is currently on the main branch and proposed state as to what is in the PR.

Based on the current state, when logs are downloaded from a sensor, the connectAndDiscoverServices method (from the msupply-ble-service) is called first. This method cancels the current connection if one exists and then reconnects. The connectAndDiscoverServices method is called when downloading logs and when retrieving the battery level. As these two operations happen concurrently, it is possible for them to overlap. This is most likely to occur when the app first starts logging. There exists the selector isSensorDownloading which is used to prevent the battery being updated if a log download is occurring. Previously, there was no check for the opposite case, where the battery is updating and logging begins. The isSensorUpdating has been added and is used to prevent a download from occurring.

@andreievg rightly asked "Wouldn't app try to re-sync the logs in the next schedule ?".

Blue Maestro The following shows the logs relating to a Blue Maestro sensor.

Red - write and monitor error Orange - the app determines based on last recorded time that there are two logs to be saved Purple - due to the error, 0 logs are downloaded Green - battery reading log (note the odd placement) Pink - the interval is set, clearing the logs

The pink operation of setting the interval to clear the logs should not have happened as no logs had been downloaded. But as it did happen, this indicates a bug.

Considering the code here: https://github.com/openmsupply/msupply-cold-chain/blob/0e351a7cb9422ce8c976a8facbcd2642860c184d/src/features/Bluetooth/Download/DownloadSlice.ts#LL158C5-L158C5

This code clears the interval based on numberOfLogsToSave, which may be a non-zero number even if the download failed. So to answer @andreievg's question, due to checking numberOfLogsToSave we set the interval even when downloading has failed and as such, there are no logs to re-sync on the next schedule. The PR proposes using sensorLogs.length instead. This shall ensure that the interval is set (and logs cleared) only if logs have been downloaded and saved successfully.

BT510 A fix for the BT510 is more complicated due to the way the logs are downloaded in batches. The downloading "prepares" a batch of logs, downloads and then acknowledges. Once acknowledged, the logs are removed from the sensor. The current state builds up an array as the logs are downloaded and once all are downloaded, saves to the database. The code relating to this is from the msupply-ble-service project: https://github.com/openmsupply/msupply-ble-service/blob/0fe001f66f669d0e9fd4d59f32a54bf238cab980/src/Bluetooth/BleService.ts#L346

The following shows an image a BT510 failure, note that there is partial data present but it is out of sync with the other sensors. This indicates that there were batches of logs that were successfully downloaded, but some also failed.

A robust fix here would be to store the logs after each download, prior to acknowledgement. This would ensure that if a download fails, or if the app is closed, the logs would remain in the sensor and could be re-read on the next schedule.

Please let me know your thoughts on the above. If it makes sense, I can look to add a BT510 fix to the PR as well.

adamdewey commented 1 year ago

Brilliant, nicely laid out @camroots.

The BT510 is proving less reliable than we were hoping (I've had a few units inexplicably die on me) and they don't meet the WHO's minimum requirements for PQS.

So recommend we don't spend time trying to fix those as they seem to be a dead end.

andreievg commented 1 year ago

@camroots, that's great, it sounds like you've identified a very possible issue of log download failure, I think the fix in PR is probably good overall, but I think we should be dealing with the core issue. When reading tryDownloadForSensor, it's really hard to know where error exists are, I kind of assume that try and catch would capture all of the errors but it looks like they are not:

https://github.com/openmsupply/msupply-cold-chain/blob/0e351a7cb9422ce8c976a8facbcd2642860c184d/.yalc/msupply-ble-service/src/Bluetooth/BleService.ts#L400-L402

I think the core issue is with error propagation added here: https://github.com/openmsupply/msupply-cold-chain/pull/222, I think it's mean to have throw

The PR proposes using sensorLogs.length instead.

Do you mean you proposing it in this issue, sorry I didn't quite see it in PR. I agree this is a good solution, but I think the deeper issue with capturing, logging but ignoring an error. So I think a throw statement should be added in the above code snippet an I think we should go through the PR above and make sure we are propagating errors up the stack.

I think we should also consider numberOfLogsToSave, it should also be non zero

adamdewey commented 8 months ago

Happy to say this has been resolved 🥳

msupply-foundation / msupply-cold-chain

Gaps in temperature logs #229