waldronlab / lefser

R implementation of the LEfSe method
https://waldronlab.io/lefser/
38 stars 6 forks source link

variables are collinear #41

Open lwaldron opened 2 weeks ago

lwaldron commented 2 weeks ago

This is a new warning being generated by the tests, and I don't think there are actually should be collinear variables here so I'm not sure what's happening.

suppressPackageStartupMessages(library(lefser))
data("zeller14")
zellersub <- zeller14[1:150, zeller14$study_condition != "adenoma"]
## subsetting a DataFrame with NULL
zellersub <- relativeAb(zellersub)
results <- lefser(zellersub, groupCol = "study_condition", blockCol = NULL)
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#>  See `?factor` or `?relevel` to change the reference category.
#> Warning in lda.default(x, grouping, ...): variables are collinear

Created on 2024-06-15 with reprex v2.1.0

lwaldron commented 2 weeks ago

Likely cause - the relab_sub_t_dfin the call raw_lda_scores <- ldaFunction(relab_sub_t_df, lgroupf) within lefser() looks like this, when these should be relative abundances:

image
lwaldron commented 2 weeks ago

Actually these relative abundances are correct, I forgot that they are scaled to add to 1e6. I think this is real collinearity, caused by the presence of synthetic clades (e.g. "Bacteria") where a parent node has only one child, or two children but one is dominant. Two TODOs:

  1. modify tests and examples to do the analysis at species level only.
  2. make warning more helpful, to explain this likely cause of collinearity and recommend analyzing a single taxonomic level only
  3. modify documentation to recommend analyzing a single taxonomic level only
shbrief commented 2 weeks ago

It seems like you are solving this with the get_terminal_nodes function. Is this something you'd like to add to the lefser function?

shbrief commented 2 weeks ago

I also want to clarify points 1 & 3: if the terminal node of an input data is a mix of strain, species, and genus, for example, what will be the recommendation?

shbrief commented 1 week ago

9e1afa161f148ce32eb7a370bf46b8eaff9ac574