merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

Snakemake SRA_download workflow writes files in lieu of written fastq #2104

Closed mschecht closed 1 year ago

mschecht commented 1 year ago

@FlorianTrigodet realized that the sra_download workflow finishes even if a fastq file is not finished downloading. I took a look at some log files and realized that even if the rule fasterq_dump has a logged "Disk quota exceeded" error, it is not a non-zero exit and just gets logged. This allows the workflow to continue with an incomplete written fastq. I believe fasterq-dump does not throw a non-zero exit status so that the user does not have to start the download from scratch. This makes sense but we need make sure Snakemake knows that the rule failed.

$ cat 00_LOGS/SRR13712839_fasterq_dump.log
Preference setting is: Prefer SRA Normalized Format files with full base quality scores if available.
01_NCBI_SRA/SRR13712839 is an SRA Normalized Format file with full base quality scores.
join   :|-------------------------------------------------- 100%
concat :|-------------------------------------- 75.58%2023-07-07T17:02:21 fasterq-dump.2.11.2 err: unknown while writing file within file system module - unknown system error errno='Disk quota exceeded(122)'
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: unknown while writing file within file system module - unknown system error errno='Disk quota exceeded(122)'
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: unknown while writing file within file system module - unknown system error errno='Disk quota exceeded(122)'
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: copy_machine.c push2q().KQueuePush() -> RC(rcCont,rcQueue,rcInserting,rcQueue,rcReadonly)
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: you have exhausted your space
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: unknown while writing file within file system module - unknown system error errno='Disk quota exceeded(122)'
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: copy_machine.c push2q().KQueuePush() -> RC(rcCont,rcQueue,rcInserting,rcQueue,rcReadonly)
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: you have exhausted your space
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: unknown while writing file within file system module - unknown system error errno='Disk quota exceeded(122)'
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: concatenator.c execute_concat_un_compressed() KFileRelease( '02_FASTA/SRR13712839_1.fastq' ).2 -> RC(rcFS,rcFile,rcWriting,rcNoObj,rcUnknown)
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: unknown while writing file within file system module - unknown system error errno='Disk quota exceeded(122)'
2023-07-07T17:02:21 fasterq-dump.2.11.2 err: concatenator.c execute_concat_un_compressed() KFileRelease( '02_FASTA/SRR13712839_2.fastq' ).2 -> RC(rcFS,rcFile,rcWriting,rcNoObj,rcUnknown)
75.60%
spots read      : 54,325,706
reads read      : 108,651,412
reads written   : 108,651,412

In this PR, I added an exception to catch a disk quota error in the log. However, I don't think it's very robust and I am looking for input.

meren commented 1 year ago

this is a good check, @mschecht :)