iqbal-lab-org / cylon

Virus assembler from amplicon sequencing reads
MIT License
5 stars 1 forks source link

Different consensus result between viridian and lab-built ivar pipeline on ncov Artic V3 protocol #9

Open kusonahikari opened 3 years ago

kusonahikari commented 3 years ago

Comparing the result of the consensus, we found that the result was not identical between viridian and our lab-built ivar pipeline on artic V3 protocol. With viridia, the low depth of coverage regions were greatly influenced, resulting ambiguities, especially the spike region. Down here is our depth of coverage plot after filtered out bad reads. batch5_1 In addition, filtering human read from raw reads would make artifact SNPs from low-frequency alleles. Our pipeline consists of mapping raw read (bwa-mem) then filtered bad quality reads ( samtools with -bSq 20 flag ). The filtered ones were then trimmed the amplicon primers set (ivar with -e flag). The consensuses were called afterthat.

iqbal-lab commented 3 years ago

Could you be specific about what the differences between your pipeline and viridian are (ideally provide your calls?),and where exactly in the spike you want us to look? You give a plot showing coverage and I am not sure what you want me to take away from that? What does this show in your mind?

kusonahikari commented 3 years ago

I have provided the calls within the Dropbox folder. For the spike region, the region of 72nd primer pairs, 21658 - 22038. Actually, I'm thinking to try again with other samples as well as last time I tried with one sample only (a bad one I guess). For the coverage, I found the low coverage regions was algined with the ambigious regions.

martinghunt commented 3 years ago

I could only see the two FASTQ files of reads in from dropbox. I assembled them with viridian version 0.1.0. I don't the same as what you are seeing.

The only dropped amplicon was the one at position 20173-20572, which does not overlap with spike. Inspecting the read mapping confirms that there's almost no reads there, so looks correct.

Screenshot is attached of the assembly made by viridian (top) compared to reference genome MN908947.3 (bottom). BLAST hits in red (both are 99% identity). The only Ns in the Viridian consensus are that dropped amplicon, visible as the gap between the two BLAST hits.

screenshot

OBannis commented 3 years ago

Just to comment, Tung is comparing a consensus sequence that went through viridian on GPAS (including human read removal steps in Catsup). I can provide the trimmed fastq's as they're on the OCI bucket if needed.

martinghunt commented 3 years ago

@OBannis thanks, could you share the trimmed fastqs with me please?

OBannis commented 3 years ago

@OBannis thanks, could you share the trimmed fastqs with me please?

Sure have emailed you a link