Closed ccbaumler closed 3 months ago
ah-hah! I am virtually positive that the error is from zip itself, so sig summarize
should trigger it, as should a straight up unzip -v
. You might look for a zero-size zip file.
It may also be that sig summarize
is handling the error properly while sig cat
is not.
I'll have to think about ways to track this down and/or better handle this kind of error. Thanks for reporting!
The final command I listed worked like a charm. Took some time to run through all 700 files, but I was easily able to find the culprit by searching the log file created.
While each of the commands @ctb listed return a similar error, unzip -v
did so the fastest.
sig summarize
sourmash sig summarize /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip
0.70user 1.22system 0:04.13elapsed 46%CPU (0avgtext+0avgdata 565248maxresident)k
967504inputs+8outputs (7118major+33529minor)pagefaults 0swaps
sig cat
sourmash sig cat /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip -o delet-me
0.79user 1.53system 0:04.34elapsed 53%CPU (0avgtext+0avgdata 563200maxresident)k
968704inputs+8outputs (7151major+33477minor)pagefaults 0swaps
unzip -v
unzip -v /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip -d delete-me/
caution: not extracting; -d ignored
Archive: /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip or
/group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip.zip, and cannot find /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip.ZIP, period.
0.00user 0.00system 0:00.00elapsed 37%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+354minor)pagefaults 0swaps
OK, so this error is triggered by faulty zip files. Maybe we should be returning a better error when the zip file is faulty 🤔
punting to #3213
The command
While building the AllTheBacteria sourmash DB, I am using:
The error
The error produced:
The investigation
I seen two possible errors immediately:
Due to the random order when using the
find
command I do not know which file the error occurred on. Therefore, I have run two separate attempts to find a signature that replicates the error above:I am currently running this command to investigate further:
I am also attempting to
sig cat
only one k size at a time instead of all three. In case it is a working memory error.