wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
168 stars 46 forks source link

Freebays limit-coverage vs skip-coverage #200

Closed abmarstrand closed 1 year ago

abmarstrand commented 1 year ago

Hi,

While testing souporcell on some of our internal data, I noticed an inconsistency when comparing the pipeline script and the step by step guide.

When running freebayes in the pipeline script, freebayes uses the --skip-coverage option while the step by step guide uses the --max coverage option. I'm guessing that this discrepancy came about when freebayes removed the --max-coverage option.

However, --skip-coverage is not the replacement for the --max-coverage option. Instead --limit-coverage should be used (see https://github.com/freebayes/freebayes/releases/tag/v1.3.0).

I am not 100% sure of the implications of this discrepancy, but from my understanding using --skip-coverage completely filters out the regions, while --limit-coverage downsamples them.

Therefore, I was wondering which option is the preffered for running souporcell? And whether any testing has been performed to investigate the effect of this change?

Otherwise, thanks for developing souporcell, it should make quite the difference for our projects :)

wheaton5 commented 1 year ago

Thanks for this correction. I have not tested the differences. My expectation is that they would be minimal. But comparisons are warranted. Still, within human samples, restricting to common variants fix almost all of the issues. So if you are dealing with human samples, just use the common variants option.

abmarstrand commented 1 year ago

Hi again,

Thank you for the answer. I ran a quick test and saw almost identical results, so I figure it is probably not a big issue :)