waldronlab / lefser

R implementation of the LEfSe method
https://waldronlab.io/lefser/
38 stars 6 forks source link

Different results in lefser and LEfSe? #18

Closed CarolineWasen closed 3 weeks ago

CarolineWasen commented 2 years ago

Hi,

Our lab frequently run microbiome analysis using the LEfSe software from the Huttenhower lab and it would be really practical to run this analysis in R!

However, when I compare the results generated by lefser and the original LEfSe they are not the same (in the analysis I compare two groups, no subgroups). My LDA table from lefser has 5 hits instead of 35 (they largely overlap). I use the default settings in both softwares and they seem to be the same.

I use this R script to format a file from my SummarizedExperiment that I can upload in the LEfSe online software: Lefse_input<-print(assay(data, 1)) Lefse_input<-cbind(rownames(Lefse_input), Lefse_input) Lefse_input<-rbind(c("Group", data$Treatment), colnames(Lefse_input), Lefse_input) write.table(Lefse_input, "Lefse_input.txt", sep = "\t", col.names = FALSE, row.names = FALSE)

Apart from that the tables are identical. Do lefser do any filtering of the data based on expression level that I'm not aware of? I noticed that if I start out with a table with relative abundance I don't get any results at all.

sdgamboa commented 2 years ago

I made some comparisons of lefse with different platforms (conda, galaxy, and R). The results (features reported as significant) highly overlap, but a few more features were reported as significant with lefser. LDA scores are the same.

I used the zeller14 dataset included in the lefser package for all analyses using study_condition as class and age_category as the subclass. A total sum scaling (TSS) was applied to get relative abundances that sum up to 1e6. Default thresholds were the same for kw, w, and LDA.

The full report can be seen in this HTML and the files are in this repo.