sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

sourmash sketch dna error when -o <filepath> does not exist #2688

Open aboffin opened 1 year ago

aboffin commented 1 year ago

Hi,

I was wondering about the order in which the arguments to sourmash sketch dna are parsed. When we have a command like this: sourmash sketch dna --merge ../dat/query/*fastq -o ../non-existing/path/output.sig this error:

    sys.exit(main())
  File "/usr/aws/lib64/python3.8/site-packages/sourmash/__main__.py", line 13, in main
    return mainmethod(args)
  File "/usr/aws/lib64/python3.8/site-packages/sourmash/cli/sketch/dna.py", line 90, in main
    return sourmash.command_sketch.dna(args)
  File "/usr/aws/lib64/python3.8/site-packages/sourmash/command_sketch.py", line 239, in dna
    _execute_sketch(args, signatures_factory)
  File "/usr/aws/lib64/python3.8/site-packages/sourmash/command_sketch.py", line 218, in _execute_sketch
    _compute_merged(args, signatures_factory)
  File "/usr/aws/lib64/python3.8/site-packages/sourmash/command_compute.py", line 291, in _compute_merged
    save_siglist(sigs, args.output)
  File "/usr/aws/lib64/python3.8/site-packages/sourmash/command_compute.py", line 320, in save_siglist
    notify(f"saved {len(save_sig)} signature(s) to '{save_sig.location}'")
  File "/usr/aws/lib64/python3.8/site-packages/sourmash/sourmash_args.py", line 847, in __exit__
    self.close()
  File "/usr/aws/lib64/python3.8/site-packages/sourmash/sourmash_args.py", line 965, in close
    with open(self.location, mode, encoding=encoding) as fp:
FileNotFoundError: [Errno 2] No such file or directory: '../non-existing/path/output.sig'

is thrown after spending quite a bit of time reading all the FASTQ files in ../dat/query.

If possible, it may be useful to throw the file not found error first before venturing to read huge FASTQ files only to realize the output path is non-existent.

Thanks!

ctb commented 1 year ago

thanks @aboffin!

right you are - when computing a merged sketch, _compute_merged at https://github.com/sourmash-bio/sourmash/blob/latest/src/sourmash/command_compute.py#L291 only tries to open the file at the very end.

thanks for reporting this! I'm not sure how to fix it cleanly because we don't want to create an empty file if no sketch is created or an error happens, but we'll figure something out!