zdk123 / SpiecEasi

Sparse InversE Covariance estimation for Ecological Association and Statistical Inference
GNU General Public License v3.0
194 stars 68 forks source link

Does the clr transformation solve the problem of variable sampling effort between samples? #163

Closed paoluxe closed 2 years ago

paoluxe commented 3 years ago

Hello,

I am currently a master trainee and I am interested in using spiec-easi to infer the microbial network. In other posts, I read that you advise to put non rarefied data in the function. If my interpretation is correct, rarefaction is used to 1) scale all samples to the same level, i.e. all samples have the same total number of reads and 2) ensure that the sampling effort is satisfactory and is the same for all samples, i.e. looking for the rarefaction curve to reach a plateau to "cut off".

I can see how the clr transformation solves 1), but I cannot see how it solves 2).

Can you please tell me more?

Sincerely

Paola

zdk123 commented 3 years ago

It's not the clr transform that achieves sample scaling but total sum normalization, e.g. scaling each count by the total observed counts in the sample. Rarefaction curves (repeated rarifying at different levels to check sampling effort) is not the same procedure as rarefaction (rarifying once at the minimum-observed sampling depth).