Closed natinreg closed 3 years ago
Sorry about the order of the images. The correct order is: middle (bams with artic), bottom (bams with poreCov, samples two and three with the deletion) and top (zoom for one of the samples where the deletion and the gaps should be seen).
Hi @natinreg, thanks for reporting!
The consensus sequences are also okay. So, we think it is something with some post-processing of the bam files in the poreCov pipeline.
So basically the consensus you get out of poreCov is correct but when you load the poreCov-generated BAM into IGV you see this strange pattern? And you can not see the deletion? Just let me make sure that this issue is not just based on some confusion of the BAMs and mapping targets:
1) the ARTIC pipeline (and thus also poreCov) map the reads to the Wuhan reference genome. After mapping, primers are clipped. The resulting BAMs are used for variant calling (*primmertrimmed.rg.sorted.bam
). These BAMs are not copied into the final results folder but can be found in the work dir.
2) After the consensus is built, poreCov aligns again the reads to the consensus to calculate coverage parameters and perform visualization of the coverage. These BAMs can be found in the per-sample result folder where also the consensus FASTA is located. Of course, deletions called in comparison to the Wuhan reference genome can not be seen anymore in the reconstructed consensus.
However, your IGV screenshot "bottom (bams with poreCov, samples two and three with the deletion)" looks strange so something is going on here :)
Dear @hoelzer, after reading your reply, now I can easily understand what is going on with the bams. I have always been interested in the primertrimmed.rg.sorted.bams. Never understood what was the difference with the final bam in the per-sample folder. It was clear that the IGV problem was something with the reference, but checked one of the files and it kept the Wuhan reference called (seq_ident_check.tsv). Now the issue is very clear (in the final bam the alignment is against the consensus!). Thank you very much for clarifying this and I hope @replikation hasn't put much time into this.
ps: I was wondering why you changed in each bam the reference sequence name to the barcode number!! thank you again
@natinreg ah great, then it was just some confusion of the BAMs! You're welcome. Now I also understand your "bottom (bams with poreCov, samples two and three with the deletion)"-Picture. The two bottom tracks are aligned against the consensus but you somehow managed to visualize them against the Wuhan reference genome. Thus, from the deletion position on, all nucleotides are shifted by a few based and thus all the substitutions pop up on the alignment view :) I am just wondering how you managed to load this into IGV ;)
And besides, good that you also look into the mappings, in particular, for important VOC samples - we discovered some strange amplicon-based edge cases w/ Illumina data and I'm sure similar things can also occur w/ Nanopore data.
However, I think this issue can be closed.
I am going on with the issue previously posted by @iferres In the IGV images below we show a "normal" IGV alignment when samples are run with the Artic protocol (using Connor's lab implementation) and a problematic IGV alignment when poreCov is used (same result with the Medaka and new Nanopolish pipeline). The samples with the issue are P.1 variants and the problem starts at the del11288-11296 that is a feature defining P.1. While the IGV view is wrong (it can't capture the deletion and produces a run through in the remaining part of the genome), the deletion is correctly called in the vcf files. The consensus sequences are also okay. So, we think it is something with some post-processing of the bam files in the poreCov pipeline. Besides, as has been mentioned by @hoelzer, the region involves a primer border. However, as can be seen in the third attached image, the problem involves reads coming from both sets of primers. As I mentioned, the consensus is okay so the good thing is that this issue will not produce misclassification of variants. But it is annoying not being able to check, when needed, the read alignments.
Could we share the fastq/bam files with you so you can try them?