Open camelest opened 2 months ago
One way to avoid excluding these reads would be to modify the bam/cram file to split the reads containing long 'N' cigar operations into multiple reads. See for example:
https://gatk.broadinstitute.org/hc/en-us/articles/360036858811-SplitNCigarReads
@vibansal Thank you so much for your reply. Sorry for my late reply, I somehow did not receive the notice.
I tried with SplitNCigarReads. For some regions, even though the reads look pretty nice as a SNP, it got filtered out at the final step. (included in 4.0.final_genotypes.vcf but not in the final result) Does it have something to do with other settings?
The output in 4.0.final_genotypes.vcf is
chrX 103482802 . T G 1 dp DP=9850;AM=0;MC=0;MF=0.000;MB=0.000;AQ=25.62;GM=0;PH=0.00,37.78,37.78,37.78;SC=None; GT:GQ:DP:AD:PS:UG:UQ 0/0:34:9850:5355,4495:.:0/1:127.63
Sorry for the late reply. The variant is filtered since it has very high depth (9850). The "-max_cov" option can be used to increase the threshold.
Thank you so much for the reply. I tried with --max_cov 100000
and now the dp changed to PASS but I still don't find it in the final output.
Does the final quality filtering occur at this step?
(in 4.0.final_genotypes.vcf)
chrX 103482802 . T G 1 PASS DP=9850;AM=0;MC=0;MF=0.000;MB=0.000;AQ=25.62;GM=0;PH=0.00,37.78,37.78,37.78;SC=None; GT:GQ:DP:AD:PS:UG:UQ 0/0:34:9850:5355,4495:.:0/1:127.63
The genotype of the variant is 0/0 and the final output contains only variants with non-ref genotypes.
Hi, thank you so much for the great tool.
I'm trying to apply longshot on long-read RNA-seq data (Iso-seq data from PacBio HiFi reads). I ran the pipeline with default options and got ~30% reads assigned with HP tags (Is is true to assume HP:i:1 represents reads with REF SNPs and HP:i:2 with ALT SNPs?) Do you have any recommendations to recover more reads?
I was wondering whether I should loosen
since the RNA-seq reads CIGAR skips large introns.
Thank you so much for your kind help.