Closed DvValk closed 1 year ago
hi, have you found the reason why you just got one fastq files? I got the same problems
Hi,
I've downloaded the .sra file of run SRR9123299 using prefetch. Based on the metadata this file should be Illumina paired-end data. However, when I try to split the file using 'fasterq-dump' I only get one output file named: 'SRR9123299.fastq. I've tried both
--split-3
and--split-files
. Could it be that the authors only uploaded one fastq file when they were supposed to upload 2?Downloaded the file using this command:
prefetch -f all SRR9123299 --output-directory my_dir/
Tried to split the file using this command:
fasterq-dump --split-3 my_dir/SRR9123299.sra -e 10
and this command:
fasterq-dump --split-files my_dir/SRR9123299.sra -e 10
Any help or explanation would be much appreciated!
hi, have you found the reason why you just got one fastq files? I got the same problems
You can see what is inside the accession with a command like this: 'vdb-dump SRR9123299 -R1'. It will display all columns of the first row. There are 2 reads per spot. The first read is biological and 100 bases long. The second read is technical and zero bases long. That means either the submitter made a mistake labeling this as paired-end data, or something went wrong processing it. In any case - there is no chance for you to get 2 reads out of this accession right now - because only one read is stored inside the accession.
You can see what is inside the accession with a command like this: 'vdb-dump SRR9123299 -R1'. It will display all columns of the first row. There are 2 reads per spot. The first read is biological and 100 bases long. The second read is technical and zero bases long. That means either the submitter made a mistake labeling this as paired-end data, or something went wrong processing it. In any case - there is no chance for you to get 2 reads out of this accession right now - because only one read is stored inside the accession.
Actually, my .sra file is from SRR11832836. It is paired according to the website https://www.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&page_size=10&acc=SRR11832836&display=metadata.
but when I run vdb-dump -R1 SRR11832836 in bash, I also got only fastq file,
the output: $ vdb-dump -R1 SRR11832836 ALIGNMENT_COUNT: 4 BASE_COUNT: 25958334766 BIO_BASE_COUNT: 25958334766 CMP_BASE_COUNT: 2505215944 CMP_LINKAGE_GROUP: CMP_READ: COLOR_MATRIX: 0, 1, 2, 3, 4, 1, 0, 3, 2, 4, 2, 3, 0, 1, 4, 3, 2, 1, 0, 4, 4, 4, 4, 4, 4 CSREAD: 10030021312113000303330123120000333210301101122230231022222300213121013002321233221232013110021031 CS_KEY: T CS_NATIVE: false FIXED_SPOT_LEN: 0 CSREAD: 10030021312113000303330123120000333210301101122230231022222300213121013002321233221232013110021031 CS_KEY: T CS_NATIVE: false FIXED_SPOT_LEN: 0 LINKAGE_GROUP: CB:CAGCATAGTAAATGTG-1|UB:CTTAAGGGGC MAX_SPOT_ID: 264880967 MIN_SPOT_ID: 1 NAME: 1 PLATFORM: SRA_PLATFORM_ILLUMINA PRIMARY_ALIGNMENT_ID: 1 QUALITY: 32, 32, 32, 32, 32, 36, 36, 36, 36, 14, 14, 36, 36, 36, 36, 36, 36, 36, 36, 36, 21, 36, 36, 36, 36, 36, 32, 14, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 14, 36, 36, 14, 36, 36, 36, 14, 32, 14, 32, 36, 36, 36, 32, 32, 36, 14, 32, 36, 36, 36, 32, 36, 32, 36, 36, 14, 32, 36, 36, 36, 36, 27, 36, 36, 36, 36, 36, 36, 32, 27, 14, 36, 32, 14, 14, 36, 36, 14, 36, 32, 32, 14, 27, 27, 14, 14 RD_FILTER: SRA_READ_FILTER_PASS READ: GGGCCCTGCAGTGCCCCGGCGCCAGCAGGGGGCGCTGGCCACCACTCTAAGCAAGAGAGCCCTGCAGTTGCCCTAGTCGCTCAGCTTGCACCCTGGCA READ_FILTER: SRA_READ_FILTER_PASS READ_LEN: 98 READ_SEG: [0, 98] READ_START: 0 READ_TYPE: SRA_READ_TYPE_BIOLOGICAL|SRA_READ_TYPE_REVERSE SIGNAL_LEN: 0 SPOT_COUNT: 264880967 SPOT_GROUP: TACAGACT SPOT_ID: 1 SPOT_LEN: 98 TRIM_LEN: 98 TRIM_START: 0
Because there is only READ_LEN: 98, is this the reason why I get only SRR11832836_1.fastq file ? the command used for generating SRR11832836_1.fastq is this:
time fasterq-dump --threads 6 \ --split-files --include-technical ./SRR11832836/SRR11832836.sra \ --progress -O ./
All the files in my directory is like this :
SO ,why do I get only fastq file from sra, please illuminate me
thanks!
Yes, that is the reason.
Yes, that is the reason.
SO, there is a problem with the sra file generated by NCBI?
if so, I WOULD remind them.
thanks
Yes you should contact NCBI about the accessions in question.
Hi,
I've downloaded the .sra file of run SRR9123299 using prefetch. Based on the metadata this file should be Illumina paired-end data. However, when I try to split the file using 'fasterq-dump' I only get one output file named: 'SRR9123299.fastq. I've tried both
--split-3
and--split-files
. Could it be that the authors only uploaded one fastq file when they were supposed to upload 2?Downloaded the file using this command:
prefetch -f all SRR9123299 --output-directory my_dir/
Tried to split the file using this command:
fasterq-dump --split-3 my_dir/SRR9123299.sra -e 10
and this command:
fasterq-dump --split-files my_dir/SRR9123299.sra -e 10
Any help or explanation would be much appreciated!