Try to use the reads rather than UMIs for counting SNPs in spatial transcriptome (10x Visium)

single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells

https://cellsnp-lite.readthedocs.io

Apache License 2.0

124 stars 11 forks source link

Try to use the reads rather than UMIs for counting SNPs in spatial transcriptome (10x Visium) #133

Open wJDKnight opened 1 month ago

wJDKnight commented 1 month ago

I am working on calling variants in spatial transcriptomics data (10x Visium). Since the sequencing depth of spatial transcriptome is poorer than single-cell data, I want to treat all reads in the bam independently. Therefore, I used --UMItag None. That means, I changed the code from this (using the default UMI tag)

cellsnp-lite -s $OUT_BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p ${n_processes} --minMAF 0.05 --minCOUNT 20 --gzip --genotype

to this (using UMItag None)

cellsnp-lite -s $OUT_BAM -b $BARCODE --UMItag None -O $OUT_DIR -R $REGION_VCF -p ${n_processes} --minMAF 0.05 --minCOUNT 20 --gzip --genotype

I expected a higher sequencing depth (DP) in the output VCF but it wasn't. The overall DP decreased.

Could it be because of some filtering criteria? When should I use --countORPHAN?

hxj5 commented 1 month ago

Hi, thanks for the detailed feedback. The --exclFLAG option probably matters in this case. It is used for read filtering based on BAM FLAGs: skip reads with any mask bits set. Default is UNMAP,SECONDARY,QCFAIL (when use UMI) or UNMAP,SECONDARY,QCFAIL,DUP (otherwise).

In other words, when you set --UMItag None, by default the reads marked as duplicates in FLAG will be filtered. To keep these reads, you can manually set "--exclFLAG", e.g., to --exclFLAG 772.

It is not recommended to use --countORPHAN in pair-end sequencing. You may check out the details of all the read filtering options in the manual.

wJDKnight commented 1 month ago

Thank you very much for such a quick response. I will check the usage of the "--exclFLAG" and update feedback later.

wJDKnight commented 1 month ago

By using that flag, the overall DP increases to three times what it was before. It seems to be working well. Thanks a lot. The cellsnp-lite is really a very nice tool.

wJDKnight commented 3 weeks ago

Though I got a larger DP by including DUP, I am wondering why excluding DP will decrease the DP. Here is an example of a loci with 4 reads in one UMI group.

In scenario A, DP for that loci will 1. In C, it will be 4. I think B should be 2, am I right? But in the real data, I found that the DP of B is smaller than A, for every loci they both detected. How does that happen?

hxj5 commented 3 weeks ago

Hi, --exclFALG option simply filters the reads by checking whether the DUP bit is set in the sam FLAG. In the above example, if all the three "blue" reads are masked as DUP, they will all be filtered and the count for them is 0 instead of 1. Following this rule, the DP for scenario B will be between 0 (if all the 4 reads are set DUP) and 4 (if none of the reads is DUP), based on their FLAG.

Cellsnp-lite totally relies on the FLAG set by the upstream alignment tool. You may further check the FLAG of the reads to investigate the DP difference between the three scenarios.