Closed ghost closed 9 years ago
@ckennedy-nmdp Could you confirm? I don't think using FASTQ for the -g
argument works either, so the bash script should convert FASTQ to FASTA in the case there are unpaired reads.
-g is only applicable with -p. For unpaired data the pipeline runs SSAKE with -f
Sorry, I still don't follow.
https://github.com/nmdp-bioinformatics/pipeline/blob/master/process_fastq.bash#L139
uses only -f
in both cases.
Just to confirm:
SSAKE doesn't accept fastq files and does not take base quality as input. It reads in fasta or, in the case of paired-end mode, a custom format akin to fasta.
A. When running assembler in paired-end mode (-p 1):
-f file.paired
NameOfPair:FRAGMENTSIZE read1:read2
-g file.fa (fasta file)
NameOfUnpairedRead readx
B. Running ssake in unpaired mode
-f (fasta file)
I forgot to mention that there are scripts in the SSAKE "tools" folder for quality trimming reads and formatting into the custom paired input required by the assembler in paired-end mode.
Below is example code that would trim (and convert fastq)
fastq-->fasta-->custom input
../tools/TQSfastq.py -f Ecoli_S1_L001_R1_001.fastq -t 20 -c 30 -e 33 ../tools/TQSfastq.py -f Ecoli_S1_L001_R2_001.fastq -t 20 -c 30 -e 33 cat Ecoli_S1_L001_R1_001.fastq.1_T20C30E33.trim.fa |perl -ne 'if(/^(>\@\S+)/){print "$1b\n";}else{print;}' >trimFIX1.fa cat Ecoli_S1_L001_R2_001.fastq.1_T20C30E33.trim.fa |perl -ne 'if(/^(>\@\S+)/){print "$1a\n";}else{print;}' >trimFIX2.fa echo ----------------------------------------------------------------------------------- echo done. Formatting fasta input for SSAKE... echo ----------------------------------------------------------------------------------- ../tools/makePairedOutput2UNEQUALfiles.pl trimFIX1.fa trimFIX2.fa 550 echo ----------------------------------------------------------------------------------- echo done. Initiating SSAKE assembly ETA 10-20min depending on system... echo ----------------------------------------------------------------------------------- time ../SSAKE -f paired.fa -g unpaired.fa -p 1 -m 80 -w 100
Thank you, @warrenlr.
@ckennedy-nmdp, this line
https://github.com/nmdp-bioinformatics/pipeline/blob/master/process_fastq.bash#L136
is passing though unpaired FASTQ formatted sequences into SSAKE using the -f
option, which will not work. I'll create a new issue.
From what I see, it still is