Closed wjyzidane closed 5 years ago
There was an error if prefixes of file names of two paired-end FASTQ not same (before _001
in that case), and this has been resolved at 86b1e77d24caa5c59de4bee59f6625cdc0975f81. Please pull the latest version of SalmonTE
.
Sorry for your inconvience and many thanks for the reporting!
Hyun-Hwan Jeong
Got it. Thanks!
Hi, I found it still output nothing for the pair-end files:
I am using SalmonTE 0.3
Also, could find a way to support the name like *_r1.fq instead of _R1.fq? Thanks!
@wjyzidane,
I tried to replicate the issue, using a pair-ended fastq file with the same name, and this works for my case. I am wondering you are using the latest version of the SalmonTE
. Could you let me know what you can see if you execute the command?
md5 SalmonTE.py
Here is my output, and this has to be the same as: MD5 (SalmonTE.py) = a8d89b2822199b0cd4c599309631e1d6
If you are not seeing the identical MD5 code, then please pull this git
repository to your local.
Furthermore, the case of *_r1.fq
has been fixed in my last patch and has to be supported. SalmonTE.py
is supposed to automatically detect end type of each fastq file.
Best Regards,
Hyun-Hwan
It works after I pull out the newest version from the git repository! Thanks!
It seems that recognizing paired-end files may be left out when they are compressed?
SalmonTE.py --version
SalmonTE 0.4
SalmonTE.py quant --reference=mm --outpath=SalmonTE_output1 --num_threads=30 /home/UTHSCSA/cutlerr/Data/Kalamakis_2019_RNA-Seq/SRA/Bulk_RNA-Seq_Data/Trimmed_reads/Paired/temp1/SRR7290434_R1.fq.gz /home/UTHSCSA/cutlerr/Data/Kalamakis_2019_RNA-Seq/SRA/Bulk_RNA-Seq_Data/Trimmed_reads/Paired/temp1/SRR7290434_R2.fq.gz
2019-03-29 00:25:38,672 Starting quantification mode
2019-03-29 00:25:38,672 Collecting FASTQ files...
2019-03-29 00:25:38,673 The input dataset is considered as a single-end dataset.
2019-03-29 00:25:38,673 Collected 2 FASTQ files.
2019-03-29 00:25:38,674 Quantification has been finished.
2019-03-29 00:25:38,674 Running Salmon using Snakemake
Job counts:
count jobs
1 all
1 collect_abundance
1 collect_mappability
2 run_salmon_gz
5
2019-03-29 00:25:38,771 Job counts:
count jobs
1 all
1 collect_abundance
1 collect_mappability
2 run_salmon_gz
5
@rrcutler Can you provide me the first few lines of each FASTQ file here?
Thank you,
Hyun-Hwan Jeong
head SRR7290434_R1.fq
@SRR7290434.1.1 HWI-ST1149:214:C4VMKACXX:6:1101:1355:2115 length=101
AAGCAGTGGTATCAACTCAGAGTACATGCGGAGACTTAGGACTTAGTCTCCCTTTCTCCCTAGGTGTAGAGGGTTCAGCCGTGTGCACCCCCCCCCTTCNN
+SRR7290434.1.1 HWI-ST1149:214:C4VMKACXX:6:1101:1355:2115 length=101
@?@?DF?DFCFHBBF@FFGGIIFHHCHB?FHGGDFHGHII?BGGEHBGIJIIJJGEHGGFHICH@EEGHHHFD?;2?@C98,9=BDCC?8=8<59><AA##
@SRR7290434.2.1 HWI-ST1149:214:C4VMKACXX:6:1101:1637:2151 length=101
AAGCAGTGGTATCAACGCATAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGTGTTTTTTTTTTTTATATTAAAATATAAAAAAAAATTTT
+SRR7290434.2.1 HWI-ST1149:214:C4VMKACXX:6:1101:1637:2151 length=101
CCCFFFFFGHHHHJJJJIHHIIHIJJJJJJJJJJJJHFDDDDDDDDDDDDDDDDDD@5&))&+(+8398&))5>5&((((((+(((+((((((&&&&++4+
@SRR7290434.3.1 HWI-ST1149:214:C4VMKACXX:6:1101:1563:2199 length=101
TACATATTGGCTTCTCCAGAAAATACACGTTTAAACAAGCCATGCACCCATCTCATTTCATTTAATTTTCTGGTCTCTCAGTCTCATCACCTTGACTAGG
head SRR7290434_R2.fq
@SRR7290434.1.2 HWI-ST1149:214:C4VMKACXX:6:1101:1355:2115 length=101
ATAGCAAAGTTAAAATAAATACTAATAACCTCTGTAACAACAGGGAAATCTAGTTCAGTAGCAGCACCTGAAAGGCAGACAGGCAGTCTCGTCAACACANN
+SRR7290434.1.2 HWI-ST1149:214:C4VMKACXX:6:1101:1355:2115 length=101
CCCFFFFFHHHDBGBEHGGGHIIFHGCGEGHEHHFHGHIJJGGIIIGG@HGHHHGIIHIIEGIHBHIIJIJJJJIIBCEHFDF=AA@ACCC;?B:>><(##
@SRR7290434.2.2 HWI-ST1149:214:C4VMKACXX:6:1101:1637:2151 length=101
TGACATTGTAACTATGAATTCATGTTTTAGAATTGTGTGTGCTCCCATGTAAGGAAACCACTTGTTAGTAAAGAAATCCATGGATTATATGTAAAAGAATT
+SRR7290434.2.2 HWI-ST1149:214:C4VMKACXX:6:1101:1637:2151 length=101
@C@FFFFFHHHHGIIJHBJHHIJIIIJJFIJIJIJEHGGFHIJJJIJJJIFIHIJIIJJJIJJJJJJJHIIJJJJJJJJJHHHHEDFFFFFFEADEEDDD>
@SRR7290434.3.2 HWI-ST1149:214:C4VMKACXX:6:1101:1563:2199 length=101
CCACCACCAAAAAAAAAAAAAAAAAATTGATAGGGGATTTTAGGATTTTGAGCCATAGCTAGCCAATATGTTACACATTGTTTTATACAATTTCCTGCTGC
Furthermore, I get the same problem when running with the unzipped files
SalmonTE.py quant --reference=mm --outpath=SalmonTE_output1 --num_threads=30 /home/UTHSCSA/cutlerr/Data/Kalamakis_2019_RNA-Seq/SRA/Bulk_RNA-Seq_Data/Trimmed_reads/Paired/temp1/SRR7290434_R1.fq /home/UTHSCSA/cutlerr/Data/Kalamakis_2019_RNA-Seq/SRA/Bulk_RNA-Seq_Data/Trimmed_reads/Paired/temp1/SRR7290434_R2.fq
2019-03-29 00:34:25,629 Starting quantification mode
2019-03-29 00:34:25,629 Collecting FASTQ files...
2019-03-29 00:34:25,629 The input dataset is considered as a single-end dataset.
2019-03-29 00:34:25,630 Collected 2 FASTQ files.
2019-03-29 00:34:25,630 Quantification has been finished.
2019-03-29 00:34:25,630 Running Salmon using Snakemake
Job counts:
count jobs
1 all
1 collect_abundance
1 collect_mappability
2 run_salmon_fq
5
2019-03-29 00:34:25,726 Job counts:
count jobs
1 all
1 collect_abundance
1 collect_mappability
2 run_salmon_fq
5
@rrcutler I have fixed the problem and created a branch for the test your case. Can you please clone the branch and test whether my fix works for you?
git clone -b paired-end https://github.com/LiuzLab/SalmonTE/
Thank you,
Hyun-Hwan Jeong
Things are workings great now - Thanks!
Hi!
I had a problem with my paired-end data files, for which only half of them would load in. I saw in another issue that the NCBI fastq format was sometimes a problem - I modified my fastq files to fit the original format and now all 16 files (8x paired-end samples) load. However, it says they load as single-end files...
Here are the first few lines of one sample's fastq files:
head /u/gironnea/polyA/scratch/fastq/colon/SRR6410603/salmonTE/SRR6410603_R1.fastq
@SRR6410603.62.1 NS500482:96:HT5M5BGXX:1:11101:20908:1061
ATTCTNCCCCAGCCCAGGCTGGGGTACCCAGAGACCTGGGAAATNNNGNNGNGTCA
+SRR6410603.62.1 NS500482:96:HT5M5BGXX:1:11101:20908:1061.1 length=64
AAAAA#EEEEEEEEEAEEEEEEEEEEEEEEE6E6EE/EEEE6EE###E##E#EEEE
@SRR6410603.63.1 NS500482:96:HT5M5BGXX:1:11101:20617:1062
CTTGTNTTTAGCAGCATTCACCCGTGTCTGTTCACTGACCAAAGNNNANNATTTGTNNNGNNNNNNNNNNNNNC
+SRR6410603.63.1 NS500482:96:HT5M5BGXX:1:11101:20617:1062.1 length=74
AAAAA#EEEEEEEEAEEAEEEEAAEEEEEEEEEEEAEAEEEEEE###A##EE<6A<###E#############A
@SRR6410603.64.1 NS500482:96:HT5M5BGXX:1:11101:15920:1062
CCAGGTTGGAACTTGCAATAACCATCCTTGCCCTGGTAGGGGTANNNGNNTTCACC
head /u/gironnea/polyA/scratch/fastq/colon/SRR6410603/salmonTE/SRR6410603_R2.fastq
@SRR6410603.62.2 NS500482:96:HT5M5BGXX:1:11101:20908:1061
GGTACATACTCATGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNC
+SRR6410603.62.2 NS500482:96:HT5M5BGXX:1:11101:20908:1061.2 length=76
AAA6A6E/EEE/AE####################################E#E
@SRR6410603.63.2 NS500482:96:HT5M5BGXX:1:11101:20617:1062
CAAATACCACCCAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTTNA
+SRR6410603.63.2 NS500482:96:HT5M5BGXX:1:11101:20617:1062.2 length=53
AAAAAEEE/EAEEEE#################################EEE#E
@SRR6410603.64.2 NS500482:96:HT5M5BGXX:1:11101:15920:1062
GGCAGTTGCTGGACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCCNTCG
Here is my command:
SalmonTE.py quant --reference=hs_grch38 --outpath=$output --num_threads=2 --exprtype=count $path/*/salmonTE/*_R*.fastq
I also tried specifying
SalmonTE.py quant --reference=hs_grch38 --outpath=$output --num_threads=2 --exprtype=count $path/*/salmonTE/*_R1.fastq $path/*/salmonTE/*_R2.fastq
but I get the same result:
2024-02-22 12:15:54,346 Starting quantification mode
2024-02-22 12:15:54,346 Collecting FASTQ files...
2024-02-22 12:15:54,361 The input dataset is considered as a single-end dataset.
2024-02-22 12:15:54,361 Collected 16 FASTQ files.
2024-02-22 12:15:54,362 Quantification has been finished.
2024-02-22 12:15:54,362 Running Salmon using Snakemake
2024-02-22 12:15:55,106 Note: NumExpr detected 24 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-22 12:15:55,106 NumExpr defaulting to 8 threads.
Building DAG of jobs...
2024-02-22 12:15:55,425 Building DAG of jobs...
I saw that for some people, it worked when specifying the files for one sample at a time, but I still get the same result: "The input dataset is considered as a single-end dataset.".
Do you have any idea how I could solve this? Otherwise, could I just sum the quantification for each sample?
Thank you so much!
I found SalmonTE works well when I give it either R1 fastq or R2 fastq. But it gives error as below when I provide both:
I am using SalmonTE 0.3.