rarify and basic.wilcox

parsifal9 commented 1 year ago

Hi mcalgaro93, I hope this is the correct forum to discuss this. Thanks for providing this package.

1) I have run a number of the DA methods. I have just realized that separate invocations of createMocks can easily be joined, so that the parallel package can distribute the workload. This is very handy as I found it took 7 hours to do 20 mocks on one processor. 1000 mocks is a lot of processing

2) I am finding that basic.wilcox is coming out as one of the best methods. Is it likely that no normalization of the data is the best strategy?

3) I am mostly asked to use phyloseq:::rarefy_even_depth

is there any way that I can compare this to the other methods on the same plot? As it modifies the phyloseq otu_table size this is not possible at the moment
If I use a separate run of benchdamic on the rarefied data, does comparing the values of FPR constitute a good argument against using rarefy?

Bye R

mcalgaro93 commented 1 year ago

Hi @parsifal9, thanks for using benchdamic.

You are right about parallel computing. That's why parallelization has been enabled in the newer version of the package (now devel 1.5.2) for both the runMocks and runSplits functions (which were the most time consuming).
The methods' performances are related to the dataset specific characteristics. If basic.wilcox has come out as one of the best methods in the TIEC chapter, It means that for your dataset it controls (better to what the other methods can do) the number of false positives. In the newer version of the package you will be able to check the FDR and the tail distribution of the p-values under the null hypothesis. So, the "no normalisation" strategy is the best strategy for the control of Type I Error, in your specific dataset. You can also check how similar the results between methods are (concordance analysis) and, if you have some expectations from the literature, compare your findings with those in literature (enrichment).
Yes, you can compare DA methods even for different count tables. The best way to do it is to use a TreeSummarizedExperiment object instead of a phyloseq. The former, indeed, can handle multiple assays. In your case one could contain the raw counts and the other the rarefied counts. See the following example:

library(benchdamic)
library(mia)

# Generate the pattern for 10 mock comparisons
# (N = 1000 is suggested)
mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10)

# Initialize some basic methods
my_basic <- set_basic(
    assay_name = c("counts", "rarefy"),
    contrast = c("group", "grp1", "grp2"), 
    test = c("t", "wilcox"), 
    paired = FALSE
    )

# We convert the object to TreeSummarizedExperiment
# So we can have one assay for the raw counts
# And one slot for the rarefied ones
tse_stool_16S <- mia::makeTreeSEFromPhyloseq(ps_stool_16S)
tse_stool_16S <- mia::subsampleCounts(x = tse_stool_16S, 
    assay_name = "counts", 
    seed = 123, 
    name = "rarefy")

# Run methods on mock datasets
results <- runMocks(mocks = mocks, method_list = my_basic,
    object = tse_stool_16S)

# Prepare results for Type I Error Control
TIEC_summary <- createTIEC(results)

# Plot the results
plotFPR(df_FPR = TIEC_summary$df_FPR)
plotQQ(df_QQ = TIEC_summary$df_QQ, 
    zoom = c(0, 1), 
    split = TRUE)
plotKS(df_KS = TIEC_summary$df_KS)
plotLogP(df_QQ = TIEC_summary$df_QQ)

So, does comparing the values of FPR constitute a good argument against using rarefy? This could be one argument. However, my suggestion is to check also the other statistics (distribution of p-values, concordance between methods, and enrichment, when possible) to increase the evidence.

I hope this can help you, best regards.

Matteo Calgaro

parsifal9 commented 1 year ago

Thanks for that. That was a big help

mcalgaro93 / benchdamic

rarify and basic.wilcox #2