msupply-foundation / open-msupply

Open mSupply represents our most recent advancement in the Logistics Management Information System (LMIS), expanding on more than two decades of development inherited from the well-established legacy of the original mSupply.
https://msupply.foundation/open-msupply/
Other
23 stars 14 forks source link

Slow login/sync user refresh and occasional DB error #5032

Open Chris-Petty opened 3 weeks ago

Chris-Petty commented 3 weeks ago

What went wrong? 😲

Using a very large testing central server testing datafile, 100s of stores and users, it was frequently taking up to 60s to simply login in OMS on a decent android tablet. Occasionally, I'd get a error reporting a db locking issue which I am certain was the tablet's DB, not the OMS central. image

A similar effect was present for doing a manual sync - it'd say "Updating user" for about 60s and then blast through the rest of sync steps like everything was normal. Once I got a similar DB lock error. Screenshot_20241007_103407_Open mSupply

Expected behaviour 🤔

Login should take a couple seconds at most, ideally near instantaneous! No errors thx.

How to Reproduce 🔨

My chungus datafile: https://drive.google.com/file/d/1DtpWT48Lod9FxeYb4b1QsbF3cnYv4Ysp/view?usp=drive_link

  1. Yuuuuge central data? My site had roughly 70k central records and 70k normal records to initialise (which isn't that bad really)
  2. Initialise
  3. Login (takes ages. Restarting app or any other tricks does not help)
  4. Once logged in try manual sync. Hangs on updating user for ages.

Your environment 🌱

lache-melvin commented 3 weeks ago

If we can share this datafile and it is reproducible, think high severity - worth investigating what the DB is doing/why it's locked 👀

andreievg commented 3 weeks ago

Lets see if we can fix it now, timebox to 2 hours of investigation

jmbrunskill commented 2 weeks ago

On my m3 Mac (small demo data file)

2024-10-15 14:59:52.992854000 [INFO] <service/src/login.rs:369>:User update took: 788ms
2024-10-15 14:59:52.992911000 [INFO] <service/src/login.rs:147>:Remote lookup took: 809ms
2024-10-15 14:59:53.753996000 [INFO] <service/src/login.rs:178>:Verify password took: 761ms
2024-10-15 14:59:53.754960000 [INFO] <service/src/login.rs:190>:Find user active on this site took: 0ms
2024-10-15 14:59:53.757352000 [INFO] <service/src/login.rs:223>:Login Time: 1573ms

On big datafile (m3) mac

2024-10-15 16:29:32.067858000 [INFO] <service/src/login.rs:120>:Login attempt for user: mSupply Support
2024-10-15 16:29:32.642939000 [INFO] <service/src/login.rs:127>:User found in central: mSupply Support
2024-10-15 16:29:35.834151000 [INFO] <service/src/login.rs:369>:User update took: 3190ms
2024-10-15 16:29:35.835180000 [INFO] <service/src/login.rs:147>:Remote lookup took: 3767ms
2024-10-15 16:29:37.428482000 [INFO] <service/src/login.rs:178>:Verify password took: 1593ms
2024-10-15 16:29:37.434379000 [INFO] <service/src/login.rs:190>:Find user active on this site took: 5ms
2024-10-15 16:29:37.439631000 [INFO] <service/src/login.rs:223>:Login Time: 5371ms

On tablet

2024-10-15 17:03:09.029034400 [INFO] <service/src/login.rs:120>:Login attempt for user: mSupply Support
2024-10-15 17:03:09.616243092 [INFO] <service/src/login.rs:127>:User found in central: mSupply Support
2024-10-15 17:04:09.043846511 [INFO] <service/src/login.rs:369>:User update took: 59427ms
2024-10-15 17:04:09.044162511 [INFO] <service/src/login.rs:147>:Remote lookup took: 60014ms
2024-10-15 17:04:09.410540626 [INFO] <service/src/login.rs:178>:Verify password took: 366ms
2024-10-15 17:04:09.411847857 [INFO] <service/src/login.rs:190>:Find user active on this site took: 1ms
2024-10-15 17:04:09.426986895 [INFO] <service/src/login.rs:223>:Login Time: 60397ms

On investigation, it seems like the problem is primarily due to deleting and re-entering permissions for the user on every login, which for this user and datafile is ALOT

2024-10-15 17:07:29.089251000 [INFO] <service/src/user_account.rs:75>:Deleting 22871 permissions 2024-10-15 17:07:29.195232000 [INFO] <service/src/user_account.rs:82>:Inserting 648 stores

The speed difference I think it down to CPU and Disk Performance differential between tablet and m3 mac

The user update that happens on every sync, can block sync too as sqlite only has one transaction open at a time. This probably explains the Database Locked issue (although I didn't re-create it yet)

jmbrunskill commented 2 weeks ago

This feels like it doesn't justify an urgent fix, and propose we move it to next refactor week?

It should affect big installs and users with lots of access permissions...

Solution Ideas for refinement...

A. Use helper methods in repo layer to reduce the number of transactions. e.g user delete_by_user_id instead of 22,000 individual deletes? Maybe that gets us a good enough speed up?

For some reason this isn't used though, and we're excluding permissions which have a context from this process. Is that a bug though?

B. Don't delete and recreate all permissions, do a comparison and then apply just the changes?

C. Only update/add permissions for stores on the sync site (648 stores results in a lot of permissions and delete/inserts for irrelevant sites?

D. Something else? Like do we really need to update the user on login? We update it a sync too, is that enough? Maybe we don't need both?

andreievg commented 1 week ago

Refinement

C. Only update/add permissions for stores on the sync site (648 stores results in a lot of permissions and delete/inserts for irrelevant sites?

But do this on central server, go go go @Chris-Petty