Open ctb opened 4 months ago
So, this seems a little weird - in utils.rs,
pub fn collection_from_zipfile(sigpath: &Path, report_type: &ReportType) -> Result<Collection> {
match Collection::from_zipfile(sigpath) {
Ok(collection) => Ok(collection),
Err(_) => bail!("failed to load {} zipfile: '{}'", report_type, sigpath\
),
}
}
and then in sourmash collection.rs,
pub fn from_zipfile<P: AsRef<Path>>(zipfile: P) -> Result<Self> {
let storage = ZipStorage::from_file(zipfile)?;
// Load manifest from standard location in zipstorage
let manifest = Manifest::from_reader(storage.load("SOURMASH-MANIFEST.csv")?.as_slice())?;
Ok(Self {
manifest,
storage: InnerStorage::new(storage),
})
}
and it would seem to me that storage.load
should fail if SOURMASH-MANIFEST.csv
doesn't exist in the zip file, and the error should be propogated.
Figured it out - load_collection
is trapping and ignoring the error from collection_from_zipfile
.
My guess is that if we encounter an unloadable zipfile we should error-exit - there's no other collection type that should end in .zip - but will think on't.
My guess is that if we encounter an unloadable zipfile we should error-exit - there's no other collection type that should end in .zip - but will think on't.
Agreed.
this comment is perhaps better here, since I suggested the same solution: https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/280#issuecomment-2010076172
So the reason errors are not automatically propagated up is because of way I changed sequential loading to allow manifests (in
load_collection
). Previously, if the extension was 'zip', we tried loading as a zipfile. If not, we tried as sig first and then fell back to pathlists. Since manifests and pathlists are so similar, I was having trouble integrating manifests into the same strategy.Now, we try each loading function (zip > manifest > signature > pathlist). If the file can be loaded, we will load, even if that collection is empty. If we encounter an error, we track it and report the final error. I thought this was working well for reporting the right errors, but apparently not when the zip fails?
I'm definitely open to better sequential loading / error propagation strategies.
A simple zipfile fix would be to only use the zip loading for '.zip' and report errors directly. For the rest, I'm not sure how to better manage loading functions without enforcing file extensions or otherwise specifying which type of input we have (and therefore which loading methods we should try).
Right, we run into the same problem in sourmash of course - load_save.py
. There I think we catch only ValueError as "this is likely NOT the right database, so try next" and/or detect None
being returned as the same thing, while other exceptions are raised and reported.
When we run:
it works fine; but with
we get
It's fine that it doesn't work but it'd be nice to provide a better error message, like "this is a zip file without a manifest, you suxor."
Related issues: