Open drpatelh opened 2 years ago
Hi @drpatelh,
Several of the samples were difficult to sequence with very high Ct values, so we ended up trying a few different methods (ARTIC V4, 1200bp, FG) on Illumina as well as the 1200bp Freed et al (2020) method on GridION. Initial analysis suggested that the ARTIC V4 amplicon and Fusion Genomics (FG) probe-capture data were complementary with the FG data filling in a lot of gaps in coverage of the ARTIC and vice versa. So naturally, we wanted to merge the reads together while only trimming primers from the ARTIC amplicon reads.
Originally, I wanted to merge the BAM files from separate viralrecon analyses on the ARTIC V4 reads with primer trimming and FG probe-capture reads without primer trimming, but unfortunately, I wouldn't have been able to use the merged BAM files as input for viralrecon, and extracting the BAM file reads to FASTQ with samtools fastq
did not remove soft-clipped regions in reads. So hard-clipping was required. Fortunately, like Nils Homer mentioned in his tweet, fgbio had the very convenient ClipBam
command to "upgrade" clipping from soft to hard, so we could ensure that the primer trimmed regions would be trimmed from the FASTQ reads.
The Bash script below is basically what was done to each ARTIC V4 iVar trimmed BAM file from viralrecon:
fgbio ClipBam -i 4662.ivar_trim.sorted.bam -o 4662.ivar_trim.sorted.fgbio-hardclip.bam -H -r MN908947.3.fa
samtools fastq -1 4662_1.fq -2 4662_2.fq -0 /dev/null -s 4662.single.fq 4662.sorted.bam
cat 4662_1.fq 4662_2.fq 4662.single.fq | gzip -c > 4662.fq.gz
The fq.gz files were provided as input for viralrecon and viralrecon handled merging reads for the same samples.
Out of curiosity, I tried running viralrecon on the FG reads with ARTIC primer trimming and found that 10% of the FG reads would have primers trimmed. So I couldn't just naively run all the sequence data through viralrecon with primer trimming because a significant amount of data would be trimmed away.
We could have used CutAdapt to trim the primer sequences from the reads, but we haven't done enough testing with CutAdapt primer trimming and iVar has worked well for our primer trimming needs.
Let me know if you have any other questions!
Thanks for putting together such a great pipeline!
Peter
Hi @drpatelh,
Thanks a lot for viralrecon
!
I recently found out I could have the use of such option to hard-clip primers and my case is a bit similar to @peterk87's except :
.ivar_trim.sorted.bam
files produces by viralrecon
and tried both samtools fastq
and bamtools bamtofastq
, but as mentionned they include soft-clipped portions, without any option to change this behaviour (even if this was asked a while ago)The little I investigated, ivar trim
does not have an option to hard-clip primers instead
=> So IMHO it makes implementation of this option a bit more complex, because :
viralrecon
to convert soft-clipped portions to hard-clipped ones (to do so I personally went for ngsutilsj
and its subcommand bam-removeclipping
, even if marked as experimental in corresponding --help
)ivar trim
step by samtools amplicon
, but as I saw it should produce slighty different results and will require requalifying downstream steps like variant calling and so
Hope this helps ! Kind regards, Felix.
Hi @tetedange13 ! Thank you for using the pipeline and for your comments!
The Bash script below is basically what was done to each ARTIC V4 iVar trimmed BAM file from viralrecon:
fgbio ClipBam -i 4662.ivar_trim.sorted.bam -o 4662.ivar_trim.sorted.fgbio-hardclip.bam -H -r MN908947.3.fa samtools fastq -1 4662_1.fq -2 4662_2.fq -0 /dev/null -s 4662.single.fq 4662.sorted.bam cat 4662_1.fq 4662_2.fq 4662.single.fq | gzip -c > 4662.fq.gz
The implementation suggested by Peter above would suit your needs right? It removes soft-clipped regions from the iVar trimmed BAM files and then converts back to FastQ.
Hi @drpatelh,
Yup, for what I saw produced FASTQ are identical with both fgbio ClipBam
and ngsutilsj
Just to precise Peter's implementation, I could not manage to make fgbio
v2.0.1 work with samtools sort -n
=> So here is the oneliner I used :
fgbio SortBam --input=sample.ivar_trim.sorted.bam --sort-order=QueryName --ouput=/dev/stdout |
fgbio ClipBam --input=/dev/stdin --upgrade-clipping --output=/dev/stdout --ref=/viralrecon/data/GCA_009858895.3_ASM985889v3_genomic.200409.fna |
samtools sort -n |
samtools fastq -c 9 -0 /dev/null -s /dev/null -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz
Regarding performances, ngsutilsj
seems a bit faster (I guess mostly because it does not require to have a query-sorted input BAM, contrary to fgbio ClipBam
)
Thanks again ! Have a nice day, Felix.
Description of feature
Context in this Tweet. Used by @peterk87 @fmaguire in this pre-print.
Be great if you can add a little description as to why hard-clipping was preferred over soft-clipping and anything else that you found on your travels that would be useful to know for the implementation including the command you used :)
Methods