[3.1.0.3] NSClient synchronisation stops loop and is extemely slow

Root-Core commented 1 year ago

Hello there!

I help someone with her loop, for some reason the Upload via NSClient was paused in AAPS (for quite some time I guess). It had ~180k entries in the queue at the point the upload was started again.

After 4.5 hours it was down to ~160k, but after that it pretty much stagnated at ~90k. It would have taken many months to complete the sync. Restarting the NSClient, AAPS, NS or the phone wasn't helping.

After two days with little progress a full synchronization was started, which started with ~200k entries and was down to ~100k after half an hour. It stagnated again at ~90k and still is. Even worse: the count increased to > 100k temporarily, but decreased fast again.

The problem is that as long as the NSClient isn't paused or/and as long as it has internet access, the loop is not working. The only "solution" that comes to mind is to delete all the data.

Nightscout Version 14.2.6 AndroidAPS Version 3.1.0.3 (master)

The NS instance supports HTTP/2 behind a nginx proxy and is self hosted.

My naive approach is, that the loop should work independently of the NSClient upload. Also it feels like the sync should be way faster.

Might be related to #619 / 516630215c3adfb32577a27bea2d9a09be58b03b

Root-Core commented 1 year ago

Okay, I finally had the time and access to the device / logs and I observed some patterns.

Each time I restarted the NSClient, the upload works okay. The log scrolls slower, but the queue count decreases at a steady pace. This is until there is a PING received, as then it waits for some time and does the next "wave". Then there are reoccurring ids, which are increased with each PING received. It might not be correlated tough, I don't know how to actually interpret the log messages.

For example the entry DBADD Acked TemporaryBasal 63de797ef9566a015cadecdf occurred 13 times after a 10 Minute test. The logs are scrolling fast then, but the queue entries are not decreased anymore.

I tried to filter the relevant logs: AndroidAPS_filtered.log cat AndroidAPS_new.log | grep "DBADD Acked\|PING received"

@MilosKozak Could this be some kind of race condition? I sent the full logs via email.

EDIT:

Out of some coincidence the phone is still with me and the patient is out of the BT range. The upload works now and I can't spot duplicated ids. It might have to do with the missing xDrip+ local broadcast or that xDrip+ does not upload BG values to NS.

MilosKozak commented 1 year ago

sync has been improved in dev a lot. Duplicated entries you mention can be related to too slow response of NS

Root-Core commented 1 year ago

This is unlikely, as deactivating the local broadcast and upload to NS in XDrip+ enabled us to upload all data in a couple of hours. Which is pretty much the same as no new values due to out of BT range.

Also is it intended behavior, that the loop stops working while the NSClient is uploading? As soon as you pause it, it works as intended.

MilosKozak commented 1 year ago

it's not intended .... it's a result of busy phone disabling broadcast will speed up the process

Root-Core commented 1 year ago

The message was like "no APS selected" on manual runs, which does not sound like a busy phone to me. It also works if the phone is in heavy use, like gaming or benchmarking. If the upload is causing the loop to stop, it should be considered to suspend the upload until the loop has done it's calculations. Mutex or something.

You have clearly more knowledge about AAPS and I'm just trying to help, as I fear this could be a serious bug.

After a restart it is doing okay and if there is no new data inserted, it works okay. If data is inserted, multiple workers point to the same entry. This seems like a typical race condition. I have not looked into the source code yet, but is the queue thread safe?

Btw, thank you for your hard work. It really helps a lot of people.

MilosKozak commented 1 year ago

1) try dev as app processing inside is mostly new 2) if you are able to replicate it with dev I'm interrested in logs after manual run fail 3) no aps selected could mean some timeout on reading from db (with internal crash afterwards)

Root-Core commented 1 year ago

I do not have access to the specific phone / setup right now and the sync was completed after stopping any incoming data. I might test it on another phone with virtual pump and do a full sync. I am not sure how good this is replicable in such a scenario.

The no APS selected message comes instantly as long as the NSClient is uploading. Once it is paused, it works as intended.

I hope that I can reproduce it on 3.1.0.3 first with this test setup and retest it on the dev branch afterwards.

MilosKozak commented 1 year ago

you can reset sync from 3dots menu

Root-Core commented 1 year ago

I thought the full sync is two way, but it seems to just upload local entries. So there are little elements in the queue on my test setup. I will try to copy the app data via some backup tool, once I got access to the original device again.

birdfly commented 1 year ago

I have this issue ,too. The Queue was 90K at the beginning ,then decreased to 30k，but after a while it started to increase to 50K

MilosKozak commented 1 year ago

count can increase because total value is not evaluated upfront

MilosKozak commented 1 year ago

sync is new in 3.2

nightscout / AndroidAPS

[3.1.0.3] NSClient synchronisation stops loop and is extemely slow #2383