statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
121 stars 26 forks source link

Recommendation for cleaning of a VCF file for demuxlet #49

Open nbartonicek opened 5 years ago

nbartonicek commented 5 years ago

We are trying to use genotyping arrays (Axiom) to get VCFs for demuxlet.

Are there any recommended steps / tutorials for SNP cleaning and filtering to optimise the demuxlet yield?

We are currently cleaning our VCF on a) snp missingness b) SNP duplicates c) SNP strand (flipping) before imputing on Michigan server, then lifting over to hg38 and taking only those variants that overlap with exons of protein_coding genes and lncRNAs.

Any help greatly appreciated! :)

hyunminkang commented 5 years ago

Your suggested steps appear good to me. It might be a good idea to try both imputed and non-imputed genotypes. Genotypes without imputation may work well if UMI counts are large enough, and ensuring consistency between them would be useful.

Hyun.

Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : hmkang@umich.edu

On Thu, Aug 29, 2019 at 11:43 PM nbartonicek notifications@github.com wrote:

We are trying to use genotyping arrays (Axiom) to get VCFs for demuxlet.

Are there any recommended steps / tutorials for SNP cleaning and filtering to optimise the demuxlet yield?

We are currently cleaning our VCF on a) snp missingness b) SNP duplicates c) SNP strand (flipping) before imputing on Michigan server, then lifting over to hg38 and taking only those variants that overlap with exons of protein_coding genes and lncRNAs.

Any help greatly appreciated! :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/49?email_source=notifications&email_token=ABPY5OOIGJHI7NGYMNKFSJ3QHCJNFA5CNFSM4ISI5AGKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HILE5XQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPY5OMX4FXVM4AACPHGTIDQHCJNFANCNFSM4ISI5AGA .

nbartonicek commented 5 years ago

Thank you very much for your quick reply! I will try both methods.