google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.2k stars 721 forks source link

Does DeepVariant support scRNA-seq data, exp: 10x genome data? #705

Closed aizhimin closed 1 year ago

aizhimin commented 1 year ago

Does DeepVariant support scRNA-seq data, exp: 10x genome data?

pgrosu commented 1 year ago

Hi Aiken,

I am assuming this is from a germline diploid sample, which is what the variant caller is designed for. Could you give a little background on your experiment, just to be sure I'm not missing anything in my assumptions below.

Based on the paper, the training was performed on RNA-seq samples that were not single cell. In theory it should work, though the 10x would be downsampled to 95 reads because of how the input to the model operates. Then first 5 row are used for representing the reference sequence, bringing the pileup image to a 100 rows. Try it with the RNA-seq model from the case study, given the above, though lowering the number of reads might help. I would be curious on how it validates with your data.

Thanks, Paul

danielecook commented 1 year ago

@aizhimin I suspect performance will be poor, but if you have a method for validating we would be interested in seeing the results.

AndrewCarroll commented 1 year ago

@aizhimin

For 10x genomics data, We've previously observed lower accuracy both across many methods and DeepVariant as well. I think we will do "OK" on 10x data, likely not what I would recommend for 10x data.

For sc-RNA seq, I have a similar reaction, but it may also be the case that the alternatives are even fewer in number. As @danielecook and @pgrosu mention, it might be worth doing if you have some way of assessing and validating the result.