statgen / popscle

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
https://github.com/statgen/popscle/wiki
Apache License 2.0
43 stars 15 forks source link

Run demuxlet on Drosophila lines with one homologous chromosome and one WT chromosome? #62

Closed jdrnevich closed 1 year ago

jdrnevich commented 1 year ago

Hello, I have previously used demuxlet successfully to demultiplex single cells and call doublets from 4 homozygous SNP lines in Drosophila using the known SNPs for each line. Now the same lab has a new experiment with some recombinant lines, where each will only have one chromosome of a pair from a known SNP line and the other from a wild-type background, so different SNPs from cell to cell. Can demuxlet be used to demultiplex single cells that come from one of 4 known SNP lines but only on one chromosome? If so, what options would I need to modify? --alpha and/or --doublet-prior? Thanks in advance!

hyunminkang commented 1 year ago

I think 4 SNPs is probably too few to distinguish the identity of droplets. There is no option to handle inbred organism in the current implementation.

jdrnevich commented 1 year ago

@hyunminkang - sorry I didn't explain it clearly. There are 4 different inbred lines, each with thousands of known SNPs. On chromosome 2L only, there are 285023 positions in the vcf file. So what happens when only one chromosome of the pair contains the inbred line SNPs?

hyunminkang commented 1 year ago

Ah I see. demuxlet does not account for LD for now, so it models all SNPs as independent. If you space each SNP relatively evenly spaced, that would work reasonably well. The model does not really care about which chromosome it came from. Of course, not assuming HWE and modeling inbreeding would work better, but it would require modification of the software.

jdrnevich commented 1 year ago

But you think running as is would do an OK job of demultiplexing cells into which of the 4 inbred lines they came from when each cell has only 1 inbred chromosome of the pair?

hyunminkang commented 1 year ago

It won't be ideal, but I suspect that the results would still be useful. What I would suggest is to create a VCF, including variant polymorphic among the 4 lines. Then the allele count (AC) will be either 2, 4, 6 if they are always homozygous. You may want to focus on exonic SNPs, and space them 1kb apart from each other (if you want to avoid including too many variants within a specific region).

jdrnevich commented 1 year ago

Thanks for your timely help! They are debating today on whether to proceed with the single cell experiment or not. It's ATAC-Seq, and we could find where we expect to see open chromatin regions based on the previous experiment. I will tell them that it would not be a complete waste to go ahead with the project, although we may have to do a bunch of tweaking during the demultiplexing.