Open Chris-Petty opened 3 weeks ago
If we can share this datafile and it is reproducible, think high severity - worth investigating what the DB is doing/why it's locked 👀
Lets see if we can fix it now, timebox to 2 hours of investigation
On my m3 Mac (small demo data file)
2024-10-15 14:59:52.992854000 [INFO] <service/src/login.rs:369>:User update took: 788ms
2024-10-15 14:59:52.992911000 [INFO] <service/src/login.rs:147>:Remote lookup took: 809ms
2024-10-15 14:59:53.753996000 [INFO] <service/src/login.rs:178>:Verify password took: 761ms
2024-10-15 14:59:53.754960000 [INFO] <service/src/login.rs:190>:Find user active on this site took: 0ms
2024-10-15 14:59:53.757352000 [INFO] <service/src/login.rs:223>:Login Time: 1573ms
On big datafile (m3) mac
2024-10-15 16:29:32.067858000 [INFO] <service/src/login.rs:120>:Login attempt for user: mSupply Support
2024-10-15 16:29:32.642939000 [INFO] <service/src/login.rs:127>:User found in central: mSupply Support
2024-10-15 16:29:35.834151000 [INFO] <service/src/login.rs:369>:User update took: 3190ms
2024-10-15 16:29:35.835180000 [INFO] <service/src/login.rs:147>:Remote lookup took: 3767ms
2024-10-15 16:29:37.428482000 [INFO] <service/src/login.rs:178>:Verify password took: 1593ms
2024-10-15 16:29:37.434379000 [INFO] <service/src/login.rs:190>:Find user active on this site took: 5ms
2024-10-15 16:29:37.439631000 [INFO] <service/src/login.rs:223>:Login Time: 5371ms
On tablet
2024-10-15 17:03:09.029034400 [INFO] <service/src/login.rs:120>:Login attempt for user: mSupply Support
2024-10-15 17:03:09.616243092 [INFO] <service/src/login.rs:127>:User found in central: mSupply Support
2024-10-15 17:04:09.043846511 [INFO] <service/src/login.rs:369>:User update took: 59427ms
2024-10-15 17:04:09.044162511 [INFO] <service/src/login.rs:147>:Remote lookup took: 60014ms
2024-10-15 17:04:09.410540626 [INFO] <service/src/login.rs:178>:Verify password took: 366ms
2024-10-15 17:04:09.411847857 [INFO] <service/src/login.rs:190>:Find user active on this site took: 1ms
2024-10-15 17:04:09.426986895 [INFO] <service/src/login.rs:223>:Login Time: 60397ms
On investigation, it seems like the problem is primarily due to deleting and re-entering permissions for the user on every login, which for this user and datafile is ALOT
2024-10-15 17:07:29.089251000 [INFO] <service/src/user_account.rs:75>:Deleting 22871 permissions 2024-10-15 17:07:29.195232000 [INFO] <service/src/user_account.rs:82>:Inserting 648 stores
The speed difference I think it down to CPU and Disk Performance differential between tablet
and m3
mac
The user update that happens on every sync, can block sync too as sqlite only has one transaction open at a time. This probably explains the Database Locked issue (although I didn't re-create it yet)
This feels like it doesn't justify an urgent fix, and propose we move it to next refactor week?
It should affect big installs and users with lots of access permissions...
Solution Ideas for refinement...
A. Use helper methods in repo layer to reduce the number of transactions.
e.g user delete_by_user_id
instead of 22,000 individual deletes?
Maybe that gets us a good enough speed up?
For some reason this isn't used though, and we're excluding permissions which have a context from this process. Is that a bug though?
B. Don't delete and recreate all permissions, do a comparison and then apply just the changes?
C. Only update/add permissions for stores on the sync site (648 stores results in a lot of permissions and delete/inserts for irrelevant sites?
D. Something else? Like do we really need to update the user on login? We update it a sync too, is that enough? Maybe we don't need both?
Refinement
C. Only update/add permissions for stores on the sync site (648 stores results in a lot of permissions and delete/inserts for irrelevant sites?
But do this on central server, go go go @Chris-Petty
What went wrong? 😲
Using a very large testing central server testing datafile, 100s of stores and users, it was frequently taking up to 60s to simply login in OMS on a decent android tablet. Occasionally, I'd get a error reporting a db locking issue which I am certain was the tablet's DB, not the OMS central.
A similar effect was present for doing a manual sync - it'd say "Updating user" for about 60s and then blast through the rest of sync steps like everything was normal. Once I got a similar DB lock error.
Expected behaviour 🤔
Login should take a couple seconds at most, ideally near instantaneous! No errors thx.
How to Reproduce 🔨
My chungus datafile: https://drive.google.com/file/d/1DtpWT48Lod9FxeYb4b1QsbF3cnYv4Ysp/view?usp=drive_link
Your environment 🌱