waldronlab / lefser

R implementation of the LEfSe method
https://waldronlab.io/lefser/
38 stars 6 forks source link

lefser producing empty results when given pathway data #33

Closed jmartin77777 closed 3 months ago

jmartin77777 commented 5 months ago

I am trying to use the lefser package to generate LDA effect size on a table of pathway relative abundances. I am able to use the tool to reproduce your example case with the zeller14 dataset. But when I try loading my own data it gives me an empty table. I am starting with an input tab-delimited text file that looks like this:

ID  COMPARISON  12DICHLORETHDEG-PWY 1CMET2-PWY
MB20-007-MB20-007_01    CTRL    0   366.878
MB19-001-MB19-001_01    CTRL    4.50847 318.903
MB20-010-MB20-010_01    PD  0   439.004
MB20-012-MB20-012_01    PD  151.322 433.688
MB20-014-MB20-014_01    PD  98.831  435.307

with each row being a sample, column 1 if the samplename, column 2 is the class, further columns are pathway CPM abundance values generated by Humann3. My R script looks like this:

data <- read.csv("Campbell_PWY_abundance.tsv", sep = "\t", header = TRUE, check.names = FALSE)
group <- data$COMPARISON # Capture the 'COMPARISON' column
abundance_data <- data[, -c(1,2)] # Exclude the 'ID' & 'COMPARISON' column
abundance_data_transposed <- t(abundance_data) # Transpose table
sample_metadata <- DataFrame(COMPARISON = group, row.names = data$ID)
se <- SummarizedExperiment(assays = list(counts = abundance_data_transposed),
                           colData = sample_metadata)
se_relative <- relativeAb(se)
lefse_results <- lefser(se_relative, groupCol = "COMPARISON")

It does generate the lefse_results object, but its just an empty table. I was initially concerned that maybe lefser was not happy with the features being pathways, and not the usual formatted taxonomy. But I then tried importing a taxonomy based table (of the same format as shown above) and am getting the same result. Can you help me troubleshoot what I am doing wrong?

lwaldron commented 5 months ago

@jmartin77777 can you reproduce this with a dataset from curatedMetagenomicData, so that we can reproduce it?

jmartin77777 commented 4 months ago

Sorry for the delayed response. I'll try and find a curated dataset to test. Can you recommend any small datasets that might be appropriate?

lwaldron commented 4 months ago

AsnicarF_2017 from curatedMetagenomicData has 24 samples and includes HUMAnN3 pathways https://waldronlab.io/curatedMetagenomicData/articles/available-studies.html

lwaldron commented 3 months ago

With issues #32 and #35 resolved in release and devel, things should be clearer now as you will see a message if there are no results that pass both the statistical significance threshold and the LDA threshold. You can test which of these is responsible for the lack of results, such as follows (such as setting a p-value threshold to 1, or a LDA threshold to 0).

suppressPackageStartupMessages({
  library(lefser)
  library(curatedMetagenomicData)
})
zeller <-
  curatedMetagenomicData("ZellerG_2014.pathway_abundance",
                         counts = TRUE,
                         dryrun = FALSE)[[1]]
#> Cannot connect to ExperimentHub server, using 'localHub=TRUE' instead
#> Using 'localHub=TRUE'
#>   If offline, please also see BiocManager vignette section on offline use
zeller <- zeller[, zeller$study_condition != "adenoma"]
zeller <- relativeAb(zeller)

lefse_results1 <-
  lefser(
    zeller,
    groupCol = "study_condition",
    kruskal.threshold = 0.05,
    lda.threshold = 2
  )
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#>  See `?factor` or `?relevel` to change the reference category.
head(lefse_results1)
#>                                                                 Names    scores
#> 1                 UNINTEGRATED|g__Ruminococcus.s__Ruminococcus_bromii -3.473240
#> 2 UNINTEGRATED|g__Lachnospiraceae_unclassified.s__Eubacterium_rectale -3.423555
#> 3                 UNINTEGRATED|g__Anaerostipes.s__Anaerostipes_hadrus -3.200505
#> 4                            UNINTEGRATED|g__Blautia.s__Blautia_obeum -3.199587
#> 5                         UNINTEGRATED|g__Blautia.s__Blautia_wexlerae -3.160973
#> 6           UNINTEGRATED|g__Bifidobacterium.s__Bifidobacterium_longum -3.037694

lefse_results2 <-
  lefser(
    zeller,
    groupCol = "study_condition",
    kruskal.threshold = 1e-6,
    lda.threshold = 2
  )
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#>  See `?factor` or `?relevel` to change the reference category.
#> No significant features found.
head(lefse_results2)
#> [1] Names  scores
#> <0 rows> (or 0-length row.names)

lefse_results3 <- 
  lefser(
    zeller, 
    groupCol = "study_condition", 
    kruskal.threshold = 0.05,
    lda.threshold = 2000
  )
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#>  See `?factor` or `?relevel` to change the reference category.
#> No significant features found.
head(lefse_results3)
#> [1] Names  scores
#> <0 rows> (or 0-length row.names)

Created on 2024-03-18 with reprex v2.1.0