Database locking is cruel to shared filesystems

blackwer commented 8 months ago

At our supercomputing center we noticed a suspicious number of directory creations and deletions on our ceph filesystem coming from one user. About 10-20 million a day... This was creating a bit of churn/backlog in the way we keep the filesystem performant for all users. After some effort, we found the issue was due to a sqlite3 dotfile based lock originating from this code. I haven't had the time to sort through to see how the database is accessed, but this kind of locking is problematic on shared filesystems in general. I haven't looked much at the code architecture, but is it possible to do something alone the lines of one of the following?

Ideally a single process/thread would just keep an exclusive lock for the duration of the run and then other processes would communicate via some kind of RPC to get/push db info. I get that this might be a small or big ask, depending on how things are already architected
Make a transaction queue and flush it at slow intervals, though if multiple processes are doing writes, that effectively means you'd need option 1 I think
Provide a means for the user to use a more "friendly to shared filesystems" lock. the default or flock are much better here, though not perfect.
Use a non-FS based DB. Prob not really an realistic option, but it is an option
???

It came to my attention that the "dotfile" method might be due to NFS support, but even with NFSv4 any of the locking methods can still fail.

The offending code is here: https://github.com/timothygrant80/cisTEM/blob/7977b8d9a7982026953600ed18e117c236ef3889/src/core/database.cpp#L497 https://github.com/timothygrant80/cisTEM/blob/7977b8d9a7982026953600ed18e117c236ef3889/src/core/database.cpp#L539

jojoelfe commented 8 months ago

Hi Robert,

as far as I know the dotfile locking was used because it was the only one working on NFS/Lustre filesystems. And you are completely right that it can still fail, which is why there are even other locking mechanisms to prevent database degradation. To your points 1 and 2, there should be only one thread accessing the database and all workers pass messages to this thread. Up until recently database writes were also quite infrequent, but the new template matching code does much more frequent writes.

For you it is probably easiest to change it yourselves and recompile (there are recent instruction here: https://cistem-org.github.io/developmental-docs/docs/dev/howto/Collaboration/vscode_dev_container.html ). Since we have no way of testing if this improves things on ceph filesystems we probably wont make this change, but we are open to PRs that would make this an option.

Best,

Johannes

jojoelfe commented 8 months ago

I looked at this a bit more out of interest. My understabding was that the lock would be created when the database is opened and closed when cisTEM exits. Instead it seems to be created and deleted upon every access. Moreover, a simple "MouseMove" event in the gui seems to trigger a database access and therefore the lockfile delete/creation.

Maybe we can just convince sqlite to keep the lock in place, since only one process is supposed to access the databse anyways. @timothygrant80 Do you have any idea how to do that?

blackwer commented 8 months ago

Johannes, Thanks for the prompt updates!

My understabding was that the lock would be created when the database is opened and closed when cisTEM exits.

My understanding is that sqlite locks are so that new connections can't be opened when writing. If what you said about only one main process having a connection, then is the lock the code requires mostly only there to prevent multiple instances of your software from accessing the database simultaneously?

Basically: if the user promises to be careful, it should be safe for me to disable the lock entirely as I understand it. If that's the case, should it also be true for a multi-node job?

jojoelfe commented 8 months ago

Basically: if the user promises to be careful, it should be safe for me to disable the lock entirely as I understand it. If that's the case, should it also be true for a multi-node job?

Yes, that SHOULD be the case. We haven't tested it in a while, so maybe there might be a problem somewhere....

bHimes commented 8 months ago

I looked at this a bit more out of interest. My understabding was that the lock would be created when the database is opened and closed when cisTEM exits. Instead it seems to be created and deleted upon every access. Moreover, a simple "MouseMove" event in the gui seems to trigger a database access and therefore the lockfile delete/creation.

I vaguely remember noticing this behavior previously, and thought the same thing then - why should a mouse event trigger a database access? IIRC this was making the GUI practically unusable when projects had O(10^6) particles.

Can you link the code you are looking at? This might be worth creating an issue for.

Maybe we can just convince sqlite to keep the lock in place, since only one process is supposed to access the databse anyways. @timothygrant80 Do you have any idea how to do that?

bHimes commented 8 months ago

FWIW, if the DB update frequency can be managed intentionally, then I would think it shouldn't be touched more often than an "expensive" calculation. I would put that in the tens of minutes range.

DB aside, that is just a lot of disk access, and atomic synchronization which we should avoid from a performance standpoint on any type of file system.

I assume it wouldn't be too hard to have a queue that periodically flushes to the database (if it doesn't exist already.)

jojoelfe commented 8 months ago

Can you link the code you are looking at? This might be worth creating an issue for.

that is just an observation from running inotify in the project folder and noticing tons of events when moving the mouse.

From a glance around that might be from https://github.com/timothygrant80/cisTEM/blob/c74afd2fc9e65c45e4959d79d6d8084845bffe79/src/gui/PickingBitmapPanel.cpp#L537

where https://github.com/timothygrant80/cisTEM/blob/c74afd2fc9e65c45e4959d79d6d8084845bffe79/src/gui/PickingBitmapPanel.cpp#L555-L556

is called, which might get to the database.

Probably the thing to do is set a breakpoint in the database class to see some stacktraces when the database gets accessed...

timothygrant80 commented 8 months ago

The database does have to be locked i think as a layer of safety. It makes no sense that it is accessed this much, perhaps @twagner9 can look into this?

twagner9 commented 8 months ago

After using a gdb rbreak on database.cpp (definitely not the most efficient method), I was able to find this function in line 363 of MatchTemplatePanel.cpp:

void MatchTemplatePanel::OnUpdateUI(wxUpdateUIEvent& event)

Which contained this code at line 378:

    else {
        if ( main_frame->current_project.database.ReturnNumberOfTemplateMatchingJobs( ) == 0 ) {
            ResumeRunCheckBox->Enable(false);
        }

This results in access of the db every time OnUpdateUI is called, which is essentially every time the mouse is moved on this panel, which is across all sub-panels of the Actions Panel as well.

Perhaps I'm wrong, but 10-20 million calls in a day from this particular issue seems reasonable given that it's triggered by any mouse movement across the whole panel?

From further testing, functions such as Database::Preapre and Database::Finalize are also called (and even a few others) from within this OnUpdateUI function, so there are multiple functions/database accesses that are occurring specifically from MatchTemplate. This would easily result in millions of db accesses in a day, I should think.

jojoelfe commented 8 months ago

Ouch, this looks like its on me. I vaguely remember being a bit confused on what triggers OnUpdateUI. I also remember thinking that it would be always better to check the database instead of the internal memory structures, since that is the source of truth, but clearly there are caveats...

I'll take a look and try to clean this up. The resuming function is super useful, but it should not need to check the database millions of times.

Miro-Astore commented 8 months ago

So in the mean time. Do all the workers need access to this database file I'm guessing yes? I'm just wondering how i can work around this and still use multiple nodes without touching the networked file system.

jojoelfe commented 8 months ago

The workers don't access the database at all. As long as you do not move the mouse over the GUI, there should be only a reasonable amount of dotfile locking.

From: Miro Astore @.> Sent: Friday, February 16, 2024 5:30:25 PM To: timothygrant80/cisTEM @.> Cc: Johannes Elferich @.>; Comment @.> Subject: Re: [timothygrant80/cisTEM] Database locking is cruel to shared filesystems (Issue #483)

So in the mean time. Do all the workers need access to this database file I'm guessing yes? I'm just wondering how i can work around this and still use multiple nodes without touching the networked file system.

— Reply to this email directly, view it on GitHubhttps://github.com/timothygrant80/cisTEM/issues/483#issuecomment-1949425513, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABOMUD2VAEEQDQLMKNY65ETYT7MYDAVCNFSM6AAAAABDJCKEKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBZGQZDKNJRGM. You are receiving this because you commented.Message ID: @.***>

Miro-Astore commented 8 months ago

Hmm ok so i can run cistem from a "headnode" and the workers will be happy. cool. thanks.

Miro-Astore commented 8 months ago

Hi my work arounds don't seem to be working, since the first step of the template matching job is to cd to the shared directory.

Also, this locking issue appears to present even when I am not interacting with the computer at all.

Miro-Astore commented 7 months ago

Hi all, just thought i'd give further information (not sure if the issue is fixed yet, thank you for looking into it.)

I have been running a single particle analysis pipeline and this also seems to produce a locking issue.

I thought you might like to know.

bHimes commented 7 months ago

Fixed in #484

timothygrant80 / cisTEM

Database locking is cruel to shared filesystems #483