saezlab / MetaProViz

R-package to perform metabolomics pre-processing, differential metabolite analysis, metabolite clustering and custom visualisations.
https://saezlab.github.io/MetaProViz/
GNU General Public License v3.0
11 stars 1 forks source link

DMA - shapiro.test sample size #73

Closed LampatLex closed 1 year ago

LampatLex commented 1 year ago

Hi,

I am getting the following error during DMA analysis.

Error in shapiro.test(x) : sample size must be between 3 and 5000

Any help will be greatly appreaciated.

Thanks

ChristinaSchmidt1 commented 1 year ago

Hi, Sure. Could you give me some information about your Input_data and the function parameters you choose? The shapiro test in R has the restriction and can only be applied up to a sample of size 5000 and the least sample size must be 3. Do you have >5000 or <3 samples per condition?

LampatLex commented 1 year ago

Hi, Sure. Could you give me some information about your Input_data and the function parameters you choose? The shapiro test in R has the restriction and can only be applied up to a sample of size 5000 and the least sample size must be 3. Do you have >5000 or <3 samples per condition?

Thanks for your reply. My data and metadata looks like this

> rawdata <- read.csv("data.csv", row.names=1)
> dim(rawdata)
[1] 666  29
> 
> metadata <- read.csv("meta.csv", row.names = 1)

> dim(metadata)
[1] 29  2

I have data for 666 metabolites in 29 samples which have different treatments with at least 4 replicates. Do you belive that this error is because I have 19,314 datapoints? Is there any way to bypass this test?

Thanks for creating a wonderful package.

Regards

ChristinaSchmidt1 commented 1 year ago

Thanks! I think you need to transform your rawdata since we need samples in rows and metabolites in columns. So your dim(rawdata) should be [1] 29 666. Even tough this should have been catched and produce row.names Input_data need to match row.names Input_SettingsFile_Sample..

Let me know if this already solves your problem.

ChristinaSchmidt1 commented 1 year ago

Here would be an example input similar to what you described:

# Set seed for reproducibility
set.seed(123)

# Number of samples and features
num_samples <- 29
num_features <- 666

# Create rawdata with random values
rawdata <- data.frame(matrix(rnorm(num_samples * num_features), ncol = num_features))
rownames(rawdata) <- paste0("Sample", 1:num_samples)

# Display dimensions of rawdata
dim(rawdata)

# Create meta data with condition column
meta <- data.frame(
  Conditions = rep(c("Condition1", "Condition2", "Condition3", "Condition4", "Condition5", "Condition6", "Condition7"), each = 4),
  stringsAsFactors = FALSE
)
meta[29, ] <- "Condition7"

rownames(meta) <- rownames(rawdata)

# Display dimensions of meta
dim(meta)

If you run MetaproViz::DMA it works:

DMA_Res<- MetaProViz::DMA(Input_data=rawdata, 
                              Input_SettingsFile_Sample=meta,
                              Input_SettingsInfo = c(conditions="Conditions", numerator=NULL, denominator = "Condition1"),
                              STAT_pval ="aov",
                              STAT_padj="fdr",
                              OutputName='Annova')
LampatLex commented 1 year ago

Here would be an example input similar to what you described:

# Set seed for reproducibility
set.seed(123)

# Number of samples and features
num_samples <- 29
num_features <- 666

# Create rawdata with random values
rawdata <- data.frame(matrix(rnorm(num_samples * num_features), ncol = num_features))
rownames(rawdata) <- paste0("Sample", 1:num_samples)

# Display dimensions of rawdata
dim(rawdata)

# Create meta data with condition column
meta <- data.frame(
  Conditions = rep(c("Condition1", "Condition2", "Condition3", "Condition4", "Condition5", "Condition6", "Condition7"), each = 4),
  stringsAsFactors = FALSE
)
meta[29, ] <- "Condition7"

rownames(meta) <- rownames(rawdata)

# Display dimensions of meta
dim(meta)

If you run MetaproViz::DMA it works:

DMA_Res<- MetaProViz::DMA(Input_data=rawdata, 
                              Input_SettingsFile_Sample=meta,
                              Input_SettingsInfo = c(conditions="Conditions", numerator=NULL, denominator = "Condition1"),
                              STAT_pval ="aov",
                              STAT_padj="fdr",
                              OutputName='Annova')

Amazing. Worked perfectly. Thanks

ChristinaSchmidt1 commented 1 year ago

Perfect, I am glad it is working now. I will close this issue as complete know and good luck with your analysis.