[QUESTION] What is the best practice to use LiteDatabase.Checkpoint() with a single instance shared in the process?

mbdavid / LiteDB

LiteDB - A .NET NoSQL Document Store in a single data file

http://www.litedb.org

MIT License

8.62k stars 1.25k forks source link

[QUESTION] What is the best practice to use LiteDatabase.Checkpoint() with a single instance shared in the process? #1775

Open zandam-xrn opened 4 years ago

zandam-xrn commented 4 years ago

Hi, I'm using a single LiteDatabase instance shared in the entire process. I noticed that without using the LiteDatabase.Checkpoint() function the *-log.db file size keeps on increasing. What I understood is that this function commits the changes to the actual file (is this correct?).

How can I implement LiteDatabase.Checkpoint() in order to avoid locks or data loss, or to not interfere with the database lifecycle? At the moment I implemented this inside a "scheduled" thread that repeats the checkpoint function every n milliseconds (for example every 15 minutes), but I fear this is not the right way to deal with the Checkpoint function.

lbnascimento commented 4 years ago

@zandam-xrn The log file continually growing can happen for two reasons:

The Checkpoint pragma is set to 0, which disables auto-checkpoint (there used to be a bug that set it to 0 when upgrading LiteDB v4 datafiles to v5). This can be easily fixed by checking the db.CheckpointSize property and, if necessary, changing it to another value (the default is 1000);
The auto-checkpoint is trying to run after every operation, but never gets the exclusive lock to the entire datafile because there are always other threads using it. This is much harder to address and, unless you're willing to restructure your entire code, probably your best option is just to call Checkpoint() periodically.

zandam-xrn commented 4 years ago

Hi @lbnascimento , I have to check that value! So if the CheckpointSize value is set to 0 the database won't do any automatic checkpoint action. If I want to call it periodically how much often should I call it? And also, there still will be the chance that it doesn't get the exclusive lock?

Edit: I checked the CheckpointSize value and it's set to 1000 (default). I didn't know this and untill now I called Checkpoint() periodically. Can this be a bad behaviour? I mean, is it possible that both the automatic and the manual Checkpoint tried to run at the same time causing variuos read/write timeouts and bad performances?

lbnascimento commented 4 years ago

@zandam-xrn The automatic checkpoint tries to run after every operation if the number of pages in the log file is greater than the CheckpointSize. But it only ends up actually running if there are no open transactions. Manual checkpoint enqueues itself and waits until it is able to get an exclusive lock over the entire datafile, or until it times out (default timeout is 1 minute, but it can be changed via db.Timeout).

As a rule of thumb, you should first try to run your application without manually calling Checkpoint(), only using it if you run into issues regarding the size of the log file.

zandam-xrn commented 4 years ago

I kept the default CheckpointSize with the default value and I add Checkpoint() function called manually, so the checkpoint is no longer a looped action. Maybe some software locks and Checkpoint timeouts I encountered were caused by two concurrent checkpoint.