yiluheihei / microbiomeMarker

R package for microbiome biomarker discovery
https://yiluheihei.github.io/microbiomeMarker
GNU General Public License v3.0
169 stars 40 forks source link

LDA score and the normalization method #53

Open refinedfan opened 2 years ago

refinedfan commented 2 years ago

Hi

Thanks very much for the shared package. I have been trying to analyse the data by using it, but got some questions and hope can get some help from your side.

I tried to use either TSS normalization method and the CPM normalization method for the example dataset "ps". However, I got completely different range of LDA scores. For example, when using CPM normalization method with the LDA score cutoff at 4, a certain number of feartures were identified as significant different between groups. But when using TSS normalization method, the largest number of LDA score was found to be 0.114, most of them were very small numbers between 0 and 0.1. I would like to know if this is normal or if there is anything wrong I made during the analysis? And another question is that which would be the more proper normalization method to use for analyzing the 16s amplicon sequencing read count data?

Thank you very much. I am looking forward to your reply. TSS normalization CPM normalization

refinedfan commented 2 years ago

Hi

Thanks very much for the shared package. I have been trying to analyse the data by using it, but got some questions and hope can get some help from your side.

I tried to use either TSS normalization method and the CPM normalization method for the example dataset "ps". However, I got completely different range of LDA scores. For example, when using CPM normalization method with the LDA score cutoff at 4, a certain number of feartures were identified as significant different between groups. But when using TSS normalization method, the largest number of LDA score was found to be 0.114, most of them were very small numbers between 0 and 0.1. I would like to know if this is normal or if there is anything wrong I made during the analysis? And another question is that which would be the more proper normalization method to use for analyzing the 16s amplicon sequencing read count data?

Thank you very much. I am looking forward to your reply. TSS normalization CPM normalization

Is it because the analysis should be applied for only integer?