Closed aizhimin closed 1 year ago
Hi Aiken,
I am assuming this is from a germline diploid sample, which is what the variant caller is designed for. Could you give a little background on your experiment, just to be sure I'm not missing anything in my assumptions below.
Based on the paper, the training was performed on RNA-seq samples that were not single cell. In theory it should work, though the 10x would be downsampled to 95 reads because of how the input to the model operates. Then first 5 row are used for representing the reference sequence, bringing the pileup image to a 100 rows. Try it with the RNA-seq model from the case study, given the above, though lowering the number of reads might help. I would be curious on how it validates with your data.
Thanks, Paul
@aizhimin I suspect performance will be poor, but if you have a method for validating we would be interested in seeing the results.
@aizhimin
For 10x genomics data, We've previously observed lower accuracy both across many methods and DeepVariant as well. I think we will do "OK" on 10x data, likely not what I would recommend for 10x data.
For sc-RNA seq, I have a similar reaction, but it may also be the case that the alternatives are even fewer in number. As @danielecook and @pgrosu mention, it might be worth doing if you have some way of assessing and validating the result.
Does DeepVariant support scRNA-seq data, exp: 10x genome data?