Suggestion for using Spreading-Correction on whole genome resequencing?

justdx commented 3 years ago

Hello, Anton. We would like to apply this tool on our genome resequencing data obtained by Illumina novaseq 6000. Do you have any suggestion on how to calculate the read counts for each cell in WGS? How about read depth on certain position over the chromosomes?

AntonJMLarsson commented 3 years ago

Hello! This is quite a tricky question. The method was originally developed for scRNA-seq data, and while it has been used in metagenomics it has not been used for WGR.

I think if you're looking for SNPs it would be feasible to use this tool by counting the number of reads which support the genotype for each sample. Falsely assigned reads should have weak support for the genotype.

However, the estimation algorithm for the spreading rate would probably fail since it relies on finding genes with high expression in only one cell. The analogous situation for you would be to have one (and only one) sample with high support for a genotype.

What you can do in this case is to visualize the read count distribution to look for patterns like the one in figure 1a. The "spread sample" read counts divided by the "source sample" read counts is your spreading rate. You should be able to then manually supply the spreading rate by the --rate option.

Hopes this helps!

Best, Anton

justdx commented 3 years ago

Dear Anton, Thanks for your reply. I will try as your suggestions. Regards. Xiao

sandberg-lab / Spreading-Correction

Suggestion for using Spreading-Correction on whole genome resequencing? #4