pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

Warn on use of SQLite without Write-Ahead-Locking #239

Closed leezu closed 3 years ago

leezu commented 6 years ago

http://www.sqlite.org/wal.html can prevent sqlite3 database lock errors in many circumstances, specifically if:

As dataset aims to make databases easy and as sqlite is arguably quite common I argue we should warn users when the open a sqlite database that is not in WAL mode, as it may save them from trouble with locking errors.

WAL mode is persistent and needs to be enabled only once for a database. Either on command line by issuing PRAGMA journal_mode=WAL or via SQL https://stackoverflow.com/questions/9671490/how-to-set-sqlite-pragma-statements-with-sqlalchemy . The later could be done automatically by default, giving an option to disable the behaviour in the dataset.connect method.

leezu commented 6 years ago

PS: WAL is included in SQLite since version 3.7.0, 2010-07-22. There are some disadvantages, but I believe they don't affect the target user group of this library:

But there are also disadvantages:

WAL normally requires that the VFS support shared-memory primitives. (Exception: WAL without shared memory) The built-in unix and windows VFSes support this but third-party extension VFSes for custom operating systems might not.
All processes using a database must be on the same host computer; WAL does not work over a network filesystem.
Transactions that involve changes against multiple ATTACHed databases are atomic for each individual database, but are not atomic across all databases as a set.
It is not possible to change the page_size after entering WAL mode, either on an empty database or by using VACUUM or by restoring from a backup using the backup API. You must be in a rollback journal mode to change the page size.
It is not possible to open read-only WAL databases. The opening process must have write privileges for "-shm" wal-index shared memory file associated with the database, if that file exists, or else write access on the directory containing the database file if the "-shm" file does not exist.
WAL might be very slightly slower (perhaps 1% or 2% slower) than the traditional rollback-journal approach in applications that do mostly reads and seldom write.
There is an additional quasi-persistent "-wal" file and "-shm" shared memory file associated with each database, which can make SQLite less appealing for use as an application file-format.
There is the extra operation of checkpointing which, though automatic by default, is still something that application developers need to be mindful of. 
kapily commented 3 years ago

@leezu - I created a helper library: https://pypi.org/project/dataset-utils/ to help with different connection modes (read only, WAL), in addition to helper functions to make it easier to get or create a table, and buffered writes.

pudo commented 3 years ago

Looks like WAL is in there now, and active by default. That should resolve this, given people would actively have to disable WAL :)