markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
794 stars 78 forks source link

Database error 13 while creating database index: database or disk is full #246

Closed ericzinnikas closed 3 years ago

ericzinnikas commented 3 years ago

I'm running duperemove on a large XFS filesystem with many files. Currently there is ~250Gb of free space, but duperemove fails with create_indexes()/3926829: Database error 13 while creating database index: database or disk is full. Is there any way to estimate how much free space is required for the sqlite DB (based on number of files / extent hashes)? Looks like ~20k files and ~1,250,000 extent hashes.

I'm running with the following flags: -hdr --dedupe-options=noblock --skip-zeroes --lookup-extents=yes

ericzinnikas commented 3 years ago

Actually I'm also realizing perhaps sqlite is trying to create a temp file on disk/ram? /tmp is 16Gb in size and I have 32Gb of RAM. Perhaps I can try changing the location it writes tmp files (if possible).

lorddoskias commented 3 years ago

Yes, so looking at the error code description:

(13) SQLITE_FULL
The SQLITE_FULL result code indicates that a write could not complete because the disk is full. Note that this error can occur when trying to write information into the main database file, or it can also occur when writing into temporary disk files.

Sometimes applications encounter this error even though there is an abundance of primary disk space because the error occurs when writing into temporary disk files on a system where temporary files are stored on a separate partition with much less space that the primary disk.

So one thing to note is that if you do not pass any of write-hashes/write-hashes/write-hashes, which you don't seem to be doing duperemove would fallback to using an in-memory database with the following, special URI filename file::memory:?cache=shared. More information about this can be found at sqlite's web page, but of particular interest is the following sentence:

If the filename is ":memory:", then a private, temporary in-memory database is created for the connection. This in-memory database will vanish when the database connection is closed. Future versions of SQLite might make use of additional special filenames that begin with the ":" character. It is recommended that when a database filename actually does begin with a ":" character you should prefix the filename with a pathname such as "./" to avoid ambiguity.
ericzinnikas commented 3 years ago

Sorry I neglected to mention I a using --hashfile=hashes.dat. Though it seems like read-hashes/write-hashes aren't what I'd want, right?

I can see hashes.dat is ~9Gb in size currently and it is the create_indexes step that is failing. I've tried to adjust the sqlite tmpdir but that has had no effect. I will see what else I can find.

ericzinnikas commented 3 years ago

Okay, the issue seems to be that setting SQLITE_TMPDIR had no effect, so sqlite was still writing to /var/tmp and filling my root partition. I've just manually ran the few queries in create_index which seems to have fixed things.

Docs seem to indicate sqlite3_temp_directory is deprecated, so I'm not sure if there is a good way for duperemove to fix this issue. Either way, probably an edge case as most sqlite DBs will be smaller (and I should probably just free up space on my root partition).