liaoherui / StrainScan

High-resolution strain-level microbiome composition analysis tool based on reference genomes and k-mers
https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-023-01615-w
MIT License
38 stars 5 forks source link

How to compare sequencing depth between different samples #7

Closed flashton2003 closed 1 year ago

flashton2003 commented 1 year ago

Hello,

I'm curious about how to interpret the sequencing depth statistic in the output between different samples.

Is this dependent on the amount of sequencing data? Or is normalised somehow?

Apologies, I'm a biologist and not able to follow the mathematics describing the calculation of it in the pre-print.

Thanks,

Phil

liaoherui commented 1 year ago

Hi, Sequencing depth estimation depends on the amount of sequencing data of one sample. Thus, for different input samples, the predicted sequencing depth is independent.

Don’t worry about it. If you have any other problems with the method or suggestions for the tool, please feel free to ask. I will reply asap.

flashton2003 commented 1 year ago

Ok, thank you.

Do you have any thoughts about how to normalise this value to enable comparison between samples? The depth divided by how many million reads were mapped perhaps?

liaoherui commented 1 year ago

Hi, If you want to compare the strain composition of different samples, then strain-level relative abundance can be a choice for you to use, which reflects the strain's relative abundance in one sample.

However, if you want to check the depth change of the same strain across different samples, you can normalize them directly. Suppose strain A's predicted depth in sample 1 is x and in sample 2 is y; then the relative change can be simply calculated by x/(x+y) and y/(x+y); the larger value may indicate that the sample has more abundant strain A (assuming that the sequencing depth of these two samples is close, in other words, this change is not caused by the batch effect).