sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

loading signatures from pathlist fails confusingly if pathlist contains bad paths #1845

Open ctb opened 2 years ago

ctb commented 2 years ago

below, if pathlist.txt contains a path that cannot be loaded, sourmash tells us that pathlist.txt can't be loaded instead of complaining about the specific file.

% sourmash sig fileinfo pathlist.txt

== This is sourmash version 4.2.5.dev19+g3a6028fb.d20220217. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

** loading from 'pathlist.txt'
Cannot open 'pathlist.txt'.
ctb commented 2 years ago

this is probably something where reporting the actual exception text can be the solution.

ctb commented 2 years ago

what happens when you use sourmash sig describe -d --

_load_databases: trying loader fn 3 'load from file list'
_load_databases: FAIL on fn 3 load from file list.
Traceback (most recent call last):
  File "/Users/t/dev/sourmash/src/sourmash/sourmash_args.py", line 450, in _load_database
    cache_size=cache_size)
  File "/Users/t/dev/sourmash/src/sourmash/sourmash_args.py", line 375, in _multiindex_load_from_pathlist
    db = MultiIndex.load_from_pathlist(filename)
  File "/Users/t/dev/sourmash/src/sourmash/index/__init__.py", line 1033, in load_from_pathlist
    file_list = load_pathlist_from_file(filename)
  File "/Users/t/dev/sourmash/src/sourmash/sourmash_args.py", line 560, in load_pathlist_from_file
    raise ValueError(f"file '{checkfile}' inside the pathlist does not exist") 
ValueError: file 'zzz.sig' inside the pathlist does not exist

so the problem is that the error that is raised from within the pathlist loading code is the same as "this is not a valid database to load, move on and try another".

so you get this top level output:

%  sourmash sig describe pathlist.txt 

== This is sourmash version 4.3.1.dev6+g03f2faec.d20220325. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

ERROR: Error while reading signatures from 'pathlist.txt'.

which is not very nice.

ctb commented 2 years ago

per https://github.com/sourmash-bio/sourmash/issues/1414#issuecomment-1203801586 maybe we should just remove pathlists altogether and standardize on --query-from-file and --from-file?