Currently, if directsketch fails for whatever reason during download+sketch, already-sketched files are unusable, because they're part of an unfinished zip file. However, we're not actually using zip for any compression here -- sigs are gz compressed themselves and then just stored in the zip.
Instead of writing directly to a zip file, we could write sigs to a temp directory (provide --temp-dir option for naming?), which would be readable upon any failure. We could optionally write manifests in chunks to make loading simpler. After sketching, we could move the files into a zip, combine the manifests, and finish the zip file. I'm not sure how much extra time this last bit would take, but likely worth it to allow recovery.
For recovery after failure / use of temp sketches, we would first look in the --temp-dir for any preexisting sketches and just avoid re-calculating those.
Currently, if
directsketch
fails for whatever reason during download+sketch, already-sketched files are unusable, because they're part of an unfinished zip file. However, we're not actually usingzip
for any compression here -- sigs are gz compressed themselves and then just stored in the zip.Instead of writing directly to a zip file, we could write sigs to a temp directory (provide
--temp-dir
option for naming?), which would be readable upon any failure. We could optionally write manifests in chunks to make loading simpler. After sketching, we could move the files into a zip, combine the manifests, and finish the zip file. I'm not sure how much extra time this last bit would take, but likely worth it to allow recovery.For recovery after failure / use of temp sketches, we would first look in the
--temp-dir
for any preexisting sketches and just avoid re-calculating those.