kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

getfastq OSError: [Errno 5] Input/output error: #134

Closed docxology closed 1 year ago

docxology commented 1 year ago

$amalgkit getfastq --id SRR11196121

AMALGKIT version: 0.9.18 AMALGKIT command: /home/tet/miniconda3/bin/amalgkit getfastq --id SRR11196121 AMALGKIT bug report: https://github.com/kfuku52/amalgkit/issues amalgkit getfastq: start pigz found. It will be used for compression/decompression in read name formatting. --id is specified. Downloading SRA metadata from Entrez. Entrez search term: SRR11196121 Number of SRA records: 1 processing SRA records: 0 - 1 2023-06-23 14:53:12: Converting 0th sample from XML to DataFrame 2023-06-23 14:53:12: Finished converting 1 samples Filtering SRA entry with --layout: auto Individual SRA size of SRR11196121: 5,215,532,212.0 bp Number of SRAs to be processed: 1 Total target size (--max_bp): 999,999,999,999,999 bp The sum of SRA sizes: 5,215,532,212.0 bp Target size per SRA: 999,999,999,999,999 bp

Processing SRA ID: SRR11196121 spot_length cannot be obtained directly from metadata. Using total_bases/total_spots instead: 293 Traceback (most recent call last): File "/home/tet/miniconda3/bin/amalgkit", line 376, in args.handler(args) File "/home/tet/miniconda3/bin/amalgkit", line 34, in command_getfastq getfastq_main(args) File "/home/tet/miniconda3/lib/python3.9/site-packages/amalgkit/getfastq.py", line 790, in getfastq_main ext = get_newest_intermediate_file_extension(sra_stat, sra_stat['getfastq_sra_dir']) File "/home/tet/miniconda3/lib/python3.9/site-packages/amalgkit/util.py", line 437, in get_newest_intermediate_file_extension files = os.listdir(work_dir) OSError: [Errno 5] Input/output error: '/media/tet/56D80A6E7A225267/Transcriptome/getfastq/SRR11196121'

kfuku52 commented 1 year ago
AMALGKIT version: 0.9.18
AMALGKIT command: /Users/kf/Dropbox/repos/amalgkit/amalgkit/amalgkit getfastq --id SRR11196121
AMALGKIT bug report: https://github.com/kfuku52/amalgkit/issues
amalgkit getfastq: start
pigz found. It will be used for compression/decompression in read name formatting.
--id is specified. Downloading SRA metadata from Entrez.
Entrez search term: SRR11196121
Number of SRA records: 1
processing SRA records: 0 - 1
2023-06-25 11:40:00: Converting 0th sample from XML to DataFrame
2023-06-25 11:40:00: Finished converting 1 samples
Filtering SRA entry with --layout: auto
Individual SRA size of SRR11196121: 5,215,532,212.0 bp
Number of SRAs to be processed: 1
Total target size (--max_bp): 999,999,999,999,999 bp
The sum of SRA sizes: 5,215,532,212.0 bp
Target size per SRA: 999,999,999,999,999 bp

Processing SRA ID: SRR11196121
spot_length cannot be obtained directly from metadata. Using total_bases/total_spots instead: 293
Library layout: paired
Number of reads: 17,772,385
Single/Paired read length: 293 bp
Total bases: 5,215,532,212 bp
Processing SRR11196121 as publicly available data from SRA.
Previously-downloaded sra file was not detected. New sra file will be downloaded.
Trying to fetch SRR11196121 from AWS: https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR11196121/SRR11196121
SRA file was downloaded with urllib.request from AWS
Total sampled bases: 5,207,308,805 bp
Command: parallel-fastq-dump -t 1 --minReadLen 25 --qual-filter-1 --skip-technical --split-3 --clip --gzip --outdir /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121 --tmpdir /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121 --minSpotId 1 --maxSpotId 17772385 -s /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121.sra
parallel-fastq-dump stdout:
Rejected 76708 READS because of Quality-Filtering
Read 17772385 spots for /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121.sra
Written 17770362 spots for /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121.sra

parallel-fastq-dump stderr:
2023-06-25 11:44:34,121 - SRR ids: ['/Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121.sra']
2023-06-25 11:44:34,121 - extra args: ['--minReadLen', '25', '--qual-filter-1', '--skip-technical', '--split-3', '--clip', '--gzip']
2023-06-25 11:44:34,121 - tempdir: /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/pfd_27jsypzt
2023-06-25 11:44:34,121 - CMD: sra-stat --meta --quick /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121.sra
2023-06-25 11:44:34,217 - /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121.sra spots: 17772385
2023-06-25 11:44:34,217 - blocks: [[1, 17772385]]
2023-06-25 11:44:34,217 - CMD: fastq-dump -N 1 -X 17772385 -O /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/pfd_27jsypzt/0 --minReadLen 25 --qual-filter-1 --skip-technical --split-3 --clip --gzip /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121.sra

Command: fastp --thread 1 --length_required 25 -j /dev/null -h /dev/null --in1 /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_1.fastq.gz --out1 /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_1.fastp.fastq.gz --in2 /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_2.fastq.gz --out2 /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_2.fastp.fastq.gz
fastp stdout:

fastp stderr:
Read1 before filtering:
total reads: 17697700
total bases: 2597423743
Q20 bases: 2538530036(97.7326%)
Q30 bases: 2430250810(93.5639%)

Read2 before filtering:
total reads: 17697700
total bases: 2596086435
Q20 bases: 2498195140(96.2293%)
Q30 bases: 2354954822(90.7117%)

Read1 after filtering:
total reads: 17442117
total bases: 2558654506
Q20 bases: 2503495780(97.8442%)
Q30 bases: 2397870837(93.7161%)

Read2 after filtering:
total reads: 17442117
total bases: 2556635364
Q20 bases: 2479140555(96.9689%)
Q30 bases: 2341592156(91.5888%)

Filtering result:
reads passed filter: 34884234
reads failed due to low quality: 507806
reads failed due to too many N: 3360
reads failed due to too short: 0
reads with adapter trimmed: 75960
bases trimmed due to adapters: 2502806

Duplication rate: 11.8111%

Insert size peak (evaluated by paired-end reads): 147

JSON report: /dev/null
HTML report: /dev/null

fastp --thread 1 --length_required 25 -j /dev/null -h /dev/null --in1 /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_1.fastq.gz --out1 /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_1.fastp.fastq.gz --in2 /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_2.fastq.gz --out2 /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_2.fastp.fastq.gz 
fastp v0.23.2, time used: 269 seconds

Deleting intermediate file: /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_1.fastq.gz
Deleting intermediate file: /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121_2.fastq.gz
Time elapsed for 1st-round sequence extraction: SRR11196121, 1,859.0 sec

--- getfastq 1st-round sequence generation report ---
Individual target size: 999,999,999,999,999 bp
Sum of fastq_dump dumped reads: 5,207,308,805 bp
Sum of fastq_dump rejected reads: 22,475,444 bp
Sum of fastq_dump written reads: 5,206,716,066 bp
Sum of fastp input reads: 5,193,510,178 bp
Sum of fastp output reads: 5,115,289,870 bp

0.00% of reads were obtained in the 1st-round sequence generation: 5,115,289,870 bp out of the individual target amount of 999,999,999,999,999 bp
Time elapsed for 1st-round sequence extraction: SRR11196121, 1,586.1 sec

Enough data were obtained in the 1st-round sequence extraction. Proceeding without the 2nd round.
2nd round read extraction improved % bp from 0.00% to 0.00%

Deleting: /Users/kf/Dropbox/repos/amalgkit/data/debug/getfastq/SRR11196121/SRR11196121.sra

--- getfastq final report ---
Target size (--max_bp): 999,999,999,999,999 bp
Sum of fastq_dump dumped reads: 5,207,308,805 bp
Sum of fastq_dump rejected reads: 22,475,444 bp
Sum of fastq_dump written reads: 5,206,716,066 bp
Sum of fastp input reads: 5,193,510,178 bp
Sum of fastp output reads: 5,115,289,870 bp
Individual SRA IDs: SRR11196121
Individual fastq_dump dumped reads (bp): 5,207,308,805
Individual fastq_dump rejected reads (bp): 22,475,444
Individual fastq_dump written reads (bp): 5,206,716,066
Individual fastp input reads (bp): 5,193,510,178
Individual fastp output reads (bp): 5,115,289,870

Time elapsed: 1,863 sec
amalgkit getfastq: end