Closed kfuku52 closed 3 years ago
So, safely_removed
means that the sample has a quant
ouptut, but the original fastq file has been deleted?
Yes, the "original" doesn't apply for private fastq though.
The desired behaviour would then be, that if the safely_removed
flag is set, sanity
shouldn't treat that sample as anomalous, if it can't find getfastq
output, correct?
So this is desired:
Looking for SRR3581852 getfastq safely_removed flag was detected.
amalgkit quant
has been completed in this sample: ./amalgkit_out/getfastq/SRR3581852 ./amalgkit_out/getfastq/SRR3581852/SRR3581852.amalgkit.fastq.gz.safely_removed
But this isn't:
getfastq output could not be found in: ./amalgkit_out/getfastq/SRR3581852, layout = single
Exactly!
Okay, I'm on it!
this is the new output:
amalgkit sanity: start
reading metadata from: /Users/s229181/Desktop/metadata/metadata/safely_remove_test.tsv
Checking essential entries from metadata file.
1 species detected:
['Populus trichocarpa']
4 SRA runs detected:
['ERR4131602' 'ERR4131603' 'ERR4131604' 'ERR4131605']
checking for getfastq outputs:
amalgkit getfastq output folder detected. Checking presence of output files.
Looking for ERR4131605
found: ['./getfastq/ERR4131605/ERR4131605.amalgkit.fastq.gz.safely_removed']
fastq files for ERR4131605 were safely removed by amalgkit quant.
checking for updated metadata in: ./metadata/updated_metadata/metadata_ERR4131605.tsv
found updated metadata!
Looking for ERR4131604
found: ['./getfastq/ERR4131604/ERR4131604.amalgkit.fastq.gz.safely_removed']
fastq files for ERR4131604 were safely removed by amalgkit quant.
checking for updated metadata in: ./metadata/updated_metadata/metadata_ERR4131604.tsv
found updated metadata!
Looking for ERR4131603
found: ['./getfastq/ERR4131603/ERR4131603.amalgkit.fastq.gz.safely_removed']
fastq files for ERR4131603 were safely removed by amalgkit quant.
checking for updated metadata in: ./metadata/updated_metadata/metadata_ERR4131603.tsv
found updated metadata!
Looking for ERR4131602
found: ['./getfastq/ERR4131602/ERR4131602.amalgkit.fastq.gz.safely_removed']
fastq files for ERR4131602 were safely removed by amalgkit quant.
checking for updated metadata in: ./metadata/updated_metadata/metadata_ERR4131602.tsv
found updated metadata!
Sequences found for all SRA IDs in /Users/s229181/Desktop/metadata/metadata/safely_remove_test.tsv !
Looking for Index file ./Index/Populus_trichocarpa* for species: Populus trichocarpa
Found ['./Index/Populus_trichocarpa.idx'] !
Index found for all species in /Users/s229181/Desktop/metadata/metadata/safely_remove_test.tsv !
checking for quant outputs:
amalgkit quant output folder detected. Checking presence of output files.
Looking for ERR4131605
Found output folder ./quant/ERR4131605 for ERR4131605
Checking for output files.
./quant/ERR4131605/ERR4131605_abundance.h5 is missing! Please check if quant ran correctly
Looking for ERR4131604
Found output folder ./quant/ERR4131604 for ERR4131604
Checking for output files.
./quant/ERR4131604/ERR4131604_abundance.h5 is missing! Please check if quant ran correctly
Looking for ERR4131603
Found output folder ./quant/ERR4131603 for ERR4131603
Checking for output files.
./quant/ERR4131603/ERR4131603_abundance.h5 is missing! Please check if quant ran correctly
Looking for ERR4131602
Found output folder ./quant/ERR4131602 for ERR4131602
Checking for output files.
./quant/ERR4131602/ERR4131602_abundance.h5 is missing! Please check if quant ran correctly
writing SRA IDs without quant output to: ./sanity/SRA_IDs_without_quant.txt
Time elapsed: 0 sec
amalgkit sanity: end
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/ERR4131605
./getfastq/ERR4131605/ERR4131605.amalgkit.fastq.gz.safely_removed
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/ERR4131604
./getfastq/ERR4131604/ERR4131604.amalgkit.fastq.gz.safely_removed
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/ERR4131603
./getfastq/ERR4131603/ERR4131603.amalgkit.fastq.gz.safely_removed
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/ERR4131602
./getfastq/ERR4131602/ERR4131602.amalgkit.fastq.gz.safely_removed
Process finished with exit code 0```
Because get_newest_intermediate_file_extension
writes safely_removed
into stderr, it will output that error message at the very end.
Do we need to output
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/ERR4131605
./getfastq/ERR4131605/ERR4131605.amalgkit.fastq.gz.safely_removed
into stderr? It's not really an error.
Also, unrelated: Newer versions of kallisto
don't produce .h5
output files anymore. Should I update sanity
to be kallisto
version specific, or just stop looking for .h5
files alltogether?
We don't need that stderr. You should hack get_newest_intermediate_file_extension()
to be compatible with sanity
or create a new function.
Which version of kallisto are you using? .h5 was produced with 0.46.2 and I thought that was the latest.
OK, it seems like you didn't have HDF5. Did you compile kallisto manually? https://github.com/pachterlab/kallisto/releases/tag/v0.46.2
OK, it seems like you didn't have HDF5. Did you compile kallisto manually? https://github.com/pachterlab/kallisto/releases/tag/v0.46.2
Yeah, I was wondering this as well. But this should be a conda installation. Just double checked, it's a manual installation. Probably missed the HD5 option.
But in any case, it sounds like HD5 will be phased out eventually.
Alright. Should be fixed in https://github.com/kfuku52/amalgkit/commit/69c46e3020f8ff73d74b61c6fdc37e854e1e7ea2
Tested with both single and paired end libraries:
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/ERR4131602
./getfastq/ERR4131602/ERR4131602.amalgkit.fastq.gz.safely_removed
checking for updated metadata in: ./metadata/updated_metadata/metadata_ERR4131602.tsv
found updated metadata!
Looking for SRR14322310
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/SRR14322310
./getfastq/SRR14322310/SRR14322310_2.amalgkit.fastq.gz.safely_removed
./getfastq/SRR14322310/SRR14322310_1.amalgkit.fastq.gz.safely_removed
checking for updated metadata in: ./metadata/updated_metadata/metadata_SRR14322310.tsv
found updated metadata!
Sequences found for all SRA IDs in /Users/s229181/Desktop/metadata/metadata/safely_remove_test.tsv !
Another error occurred with the latest version
Looking for SRR3581852
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: /lustre7/home/lustre4/kfuku/my_project/nepenthes_gracilis/20211013_RNAseq/amalgkit_out/getfastq/SRR3581852
/lustre7/home/lustre4/kfuku/my_project/nepenthes_gracilis/20211013_RNAseq/amalgkit_out/getfastq/SRR3581852/SRR3581852.amalgkit.fastq.gz.safely_removed
getfastq output could not be found in: /lustre7/home/lustre4/kfuku/my_project/nepenthes_gracilis/20211013_RNAseq/amalgkit_out/getfastq/SRR3581852, layout = single
Traceback (most recent call last):
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/bin/amalgkit", line 378, in <module>
args.handler(args)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/bin/amalgkit", line 81, in command_sanity
sanity_main(args)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/sanity.py", line 234, in sanity_main
check_getfastq_outputs(args, sra_ids, metadata, output_dir)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/sanity.py", line 58, in check_getfastq_outputs
ext = get_newest_intermediate_file_extension(sra_stat, sra_path)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/util.py", line 67, in get_newest_intermediate_file_extension
return ext_out
UnboundLocalError: local variable 'ext_out' referenced before assignment
return ext_out shouldn't be in line 67 any more.
Ah. I forgot to increase the version. Maybe it didn't properly update on your end, because it didn't find a new version.
I commited the new init.py
sanity
worked, thank you!
@Hego-CCTB Could you fix it? Also,
sanity
shouldn't exit every time it detects an anomaly. Otherwise, you are able to recognize and fix the problems one at a time.https://github.com/kfuku52/amalgkit/blob/896f72526fe66add194aed30a28e04cc5781e512/amalgkit/util.py#L66-L67