Open wJDKnight opened 1 month ago
Hi, thanks for the detailed feedback. The --exclFLAG
option probably matters in this case. It is used for read filtering based on BAM FLAGs: skip reads with any mask bits set. Default is UNMAP,SECONDARY,QCFAIL
(when use UMI) or UNMAP,SECONDARY,QCFAIL,DUP
(otherwise).
In other words, when you set --UMItag None
, by default the reads marked as duplicates in FLAG will be filtered. To keep these reads, you can manually set "--exclFLAG", e.g., to --exclFLAG 772
.
It is not recommended to use --countORPHAN
in pair-end sequencing. You may check out the details of all the read filtering options in the manual.
Thank you very much for such a quick response. I will check the usage of the "--exclFLAG" and update feedback later.
By using that flag, the overall DP increases to three times what it was before. It seems to be working well. Thanks a lot. The cellsnp-lite is really a very nice tool.
Though I got a larger DP by including DUP, I am wondering why excluding DP will decrease the DP. Here is an example of a loci with 4 reads in one UMI group.
In scenario A, DP for that loci will 1. In C, it will be 4. I think B should be 2, am I right? But in the real data, I found that the DP of B is smaller than A, for every loci they both detected. How does that happen?
Hi, --exclFALG
option simply filters the reads by checking whether the DUP
bit is set in the sam FLAG
. In the above example, if all the three "blue" reads are masked as DUP
, they will all be filtered and the count for them is 0 instead of 1. Following this rule, the DP for scenario B will be between 0 (if all the 4 reads are set DUP) and 4 (if none of the reads is DUP), based on their FLAG.
Cellsnp-lite totally relies on the FLAG set by the upstream alignment tool. You may further check the FLAG of the reads to investigate the DP difference between the three scenarios.
I am working on calling variants in spatial transcriptomics data (10x Visium). Since the sequencing depth of spatial transcriptome is poorer than single-cell data, I want to treat all reads in the bam independently. Therefore, I used
--UMItag None
. That means, I changed the code from this (using the default UMI tag)cellsnp-lite -s $OUT_BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p ${n_processes} --minMAF 0.05 --minCOUNT 20 --gzip --genotype
to this (using UMItag None)
cellsnp-lite -s $OUT_BAM -b $BARCODE --UMItag None -O $OUT_DIR -R $REGION_VCF -p ${n_processes} --minMAF 0.05 --minCOUNT 20 --gzip --genotype
I expected a higher sequencing depth (DP) in the output VCF but it wasn't. The overall DP decreased.
Could it be because of some filtering criteria? When should I use --countORPHAN?