microbiome / mia

Microbiome analysis
https://microbiome.github.io/mia/
Artistic License 2.0
46 stars 25 forks source link

getExperimentCrossAssociation fails when sample names are missing #473

Closed antagomir closed 7 months ago

antagomir commented 8 months ago

The getExperimentCrossAssociation function fails when TSE does not have colnames.

# Create demo data
library(mia)
data(GlobalPatterns)
colData(GlobalPatterns)$rand1 <- rnorm(ncol(GlobalPatterns))
colData(GlobalPatterns)$rand2 <- rnorm(ncol(GlobalPatterns))
colData(GlobalPatterns)$rand3 <- rnorm(ncol(GlobalPatterns))
colData(GlobalPatterns)$rand4 <- rnorm(ncol(GlobalPatterns))

# This Works
res <- getExperimentCrossAssociation(GlobalPatterns,
                                     # Top ARG classes, count assay
                                     colData_variable1 = c("rand1", "rand2"),
                     # ... correlated with colData columns starting with "nmf"
                                     colData_variable2 = c("rand3", "rand4"),
                     # use Kendall's tau for robustness
                                     method = "kendall",
                     mode = "table",
                     p_adj_method="fdr",
                     test_significance = TRUE) 

# Doesnt work when colnames are removed; however this might be the case in some real data sets and the associatiosn
# would still be meaningful if the ordering of the samples is fine

colnames(GlobalPatterns) <- NULL
res <- getExperimentCrossAssociation(GlobalPatterns,
                                     # Top ARG classes, count assay
                                     colData_variable1 = c("rand1", "rand2"),
                     # ... correlated with colData columns starting with "nmf"
                                     colData_variable2 = c("rand3", "rand4"),
                     # use Kendall's tau for robustness
                                     method = "kendall",
                     mode = "table",
                     p_adj_method="fdr",
                     test_significance = TRUE) 

Error in .sampleMapFromData(colData, experiments) : 
  colData rownames and ExperimentList colnames are empty
ake123 commented 8 months ago

This error is coming from MAE class. https://github.com/waldronlab/MultiAssayExperiment/blob/devel/R/MultiAssayExperiment-class.R stop("colData rownames and ExperimentList colnames are empty") .sampleMapFromData(colData, experiments) MultiAssayExperiment(experiments = experiments) at getExperimentCrossAssociation.R#314

TuomasBorman commented 8 months ago

One option would be to add colnames ("sample1", "sample2"...). However, in what situation user does want to calculate correlation but don't have colnames? I think there is no reason for colnames to be NULL

I think we should check in input check that there are colnames and rownames.

antagomir commented 8 months ago

To me this occurred with one data set I had been successfully working with for various analyses; most analyses do not seem to require explicit colnames. Then I just wanted to quickly add some correlation analysis and this failed and it took some time to spot the reason.

Measuring association makes sense also without sample names, as long as the sample ordering is correct. I think that we should be able to assume that the user has set that correctly.

One option would be to add a more informative warning / suggestion but allow the calculation to go through? I agree that ideally this would be solved at the root cause i.e. in MAE package. Perhaps they accept a PR? If not, we could consider tweaking the mia wrapper.

Or we can just ignore this and expect that users will always have sample names. Requiring sample names could be an additional safeguard against unintended errors. But if that prevents educated users from running valid analyses than I am not sure if it is helpful.

TuomasBorman commented 8 months ago

Well other option would be to add column names if they are NULL in the beginning of function?

antagomir commented 8 months ago

Yes that works and we could consider.

The main problem is that it is a kind of hack and would violate the design principle that the problem is fixed at the root cause, and in as general way as possible.

TuomasBorman commented 8 months ago

Yep, but this is not a real problem in MAE; the message is more or less informative

How getExpCrossCorr works when input is TreeSE (and not MAE)

  1. Create a MAE based on TreeSE
  2. Calculate correlation with general function that expects MAE as input

--> This thing is caused because our function works like that so I think we can handle this problem inside our function

On the other hand, we must name cols and rows anyways, because NULL would cause a problem if user wants the output as long-formatted table

antagomir commented 7 months ago

Right, sounds very feasible.

If there is no real reason to fail the function if names are missing, then how about:

I feel that this would be a better solution than throwing an error and stopping execution.

TuomasBorman commented 7 months ago

Right, sounds very feasible.

If there is no real reason to fail the function if names are missing, then how about:

  • add arbitrary names where needed
  • throw a warning that explains that names were added since this is necessary for full functionality

I feel that this would be a better solution than throwing an error and stopping execution.

Ok, yep. That sounds better plan.