Closes #64
Implements datastore garbage collection with direct iteration.
Implementation Notes:
datastore iterators require flushing the db to work reliably
a temporary db is used for storing the live object index
/data/compact api endpoint for manual compaction; gc triggers it if it deletes any objects, but it's also useful on its own right
/data/sync api endpoint: the WAL seems to be using some dark space in linux/ext4 (not visible in ls, but visible in du) following merges, but it seems to be bounded by the log size. Flushing the datastore through the endpoint immediately reclaims this space,
/data/keys api endpoint which dumps the keys for all objects in the datastore.
Example:
$ du --si /mnt/ssd2/data/
193k /mnt/ssd2/data/
$ mcclient status online
status set to online
$ time mcclient merge QmeiY2eHMwK92Zt6X4kUUC3MsjMmVb2VnGZ17DhnhRPCEQ "SELECT * FROM images.dpla LIMIT 1000000"
merged 1000000 statements and 1000001 objects
real 6m48.171s
user 0m0.345s
sys 0m0.033s
$ du --si /mnt/ssd2/data/
867M /mnt/ssd2/data/
$ time mcclient delete "DELETE FROM images.dpla"
Deleted 1000000 statements
real 1m2.060s
user 0m0.332s
sys 0m0.041s
$ mcclient status offline
status set to offline
$ time curl http://127.0.0.1:9002/data/gc
1000001
real 0m57.923s
user 0m0.000s
sys 0m0.008s
$ du --si /mnt/ssd2/data/
193k /mnt/ssd2/data/
Closes #64 Implements datastore garbage collection with direct iteration.
Implementation Notes:
/data/compact
api endpoint for manual compaction; gc triggers it if it deletes any objects, but it's also useful on its own right/data/sync
api endpoint: the WAL seems to be using some dark space in linux/ext4 (not visible in ls, but visible in du) following merges, but it seems to be bounded by the log size. Flushing the datastore through the endpoint immediately reclaims this space,/data/keys
api endpoint which dumps the keys for all objects in the datastore.Example: