waldronlab / lefser

R implementation of the LEfSe method
https://waldronlab.io/lefser/
38 stars 6 forks source link

Clarify what `blockCol` is #48

Open lwaldron opened 3 days ago

lwaldron commented 3 days ago

@asyakhl what is the historical reason for the naming of the lefser() blockCol argument? This name, and the help for the argument:

character(1) Optional column name in colData(relab) indicating the blocks, usually a factor with two levels (e.g., c("adult", "senior"); default NULL).

implies that it is a blocking variable (ie https://en.wikipedia.org/wiki/Blocking_(statistics) ). However, in LEfSe, the two grouping variables define main groups and subgroups for pairwise comparisons, not a blocking variable. Also, why would subgoups usually have two levels?

I think this causes confusion by users trying to use it to define blocking variables (ie, see `blockCol = "patient" in #47). I'm hesitant to rename the variable and break existing code, but we should make it much clearer in documentation how this argument is used and that it doesn't refer to a blocking variable in the statistical sense.

lwaldron commented 3 days ago

By the way, I noted that the original Python LEfSe functions and manuscript refer to "class" and "subclass", not "group" and "block" (e.g. https://huttenhower.sph.harvard.edu/lefse/ for galaxy module and https://github.com/SegataLab/lefse/blob/master/lefse/lefse_run.py for Python program). This may be worth going through a deprecation cycle for to avoid confusion.