single-cell-genetics / cellSNP

Pileup biallelic SNPs from single-cell and bulk RNA-seq data
Apache License 2.0
74 stars 11 forks source link

Change the flag filtering default to include PCR duplicates #13

Closed GWW closed 4 years ago

GWW commented 4 years ago

From my cursory understanding of cellSNP you filter all of the reads that are marked as PCR duplicates by Cell Ranger. However, this would remove a large number of UMI duplicates as noted by the vartrix documentation:

ignore alignments marked as duplicates? Take care when turning this on with scRNA-seq data, as duplicates are marked in that pipeline for every extra read sharing the same UMI/CB pair, which will result in most variant data being lost.

I wouldn't be surprised if this negatively affects the performance of vireo on some datasets.

hxj5 commented 4 years ago

Hi, thanks for your feedback. If the reads sharing the same UMI/CB pair are marked as PCR duplicates by CellRanger, cellSNP would filter them given a small value of parameter maxFLAG. As we found some test datasets on which CellRanger did not perform the marking, we would run CellRanger on a few more datasets, especially using its default parameters.

GWW commented 4 years ago

Alright, if you think it's the more reasonable choice to filter them. Perhaps it may be worth explicitly stating in the manual that they are filtered by default and may lead to an increased SNV false negative rate, which has been the case in my experience.

hxj5 commented 4 years ago

We have changed the default value of maxFLAG to include PCR duplicates for scRNA-seq data when UMItag is turned on and state it in the README file. Thanks for your advice!

GWW commented 4 years ago

No problem. I am glad you made the change in our experience Vireo has performed extremely well using all of the reads including PCR duplicates.