wdl2459 / ConQuR

Batch effects removal for microbiome data via conditional quantile regression
GNU General Public License v3.0
27 stars 4 forks source link

ConQuR fails when each covariate subset contains only 1 level in taxonomic count #6

Open tommyfuu opened 1 year ago

tommyfuu commented 1 year ago

Hi ConQuR developers,

While running ConQuR with covariates on one of my datasets, I ran into the following error:

Error in { : task 137 failed - "contrasts can be applied only to factors with 2 or more levels" Calls: run_methods -> ConQuR_libsize -> %do% -> <Anonymous>

Upon running ConQuR without covariates, the results were generated perfectly fine. After further examinations on my dataframe as well as some search on stackoverflow, I discovered that the problem might be that ConQuR does not cover the edge case where there's no variance for count within a covariate subset. For example, a taxonomy might have non-zero variance in general, but when we look at the female subset of the data, the variance is now zero. When ConQuR calls glm for each taxon in the ConQuR_each helper function, this lack of at least 2 levels triggers this error.

Details regarding this error in glm can be seen in this stackoverflow page.

The easiest and most intuitive fix might be to do a further sanity check to make sure each covariate subset has non-zero variance before running regression fits in ConQuR - for those taxa with zero variance in covariate subsets, ConQuR can opt to not change them at all.

Might attempt to fix this error in a couple of weeks (after my finals season is over) and make a pull request for that so the developers can opt to integrate this feature! If you guys are able to fix it before then and I can use the revised version of ConQuR, that will still be much appreciated!

wdl2459 commented 1 year ago

Thanks for the suggestion! Before adding the sanity check part, would you please share with me a toy data that can produce the error? I would like to locate all the possible places that can lead to the error. Thank you!

tybonic commented 1 year ago

Thanks for addressing this issue!

I run into a similar error:

Error in {: task 2 failed - "contrasts can be applied only to factors with 2 or more levels"
Traceback:

1. ConQuR(tax_tab = dat_asvtab, batchid = batchid, covariates = covar, 
 .     batch_ref = "1")
2. foreach(ll = 1:ncol(tax_tab), .combine = cbind) %do% {
 .     y = as.numeric(tax_tab[, ll])
 .     ConQuR_each(y = y, X = X, X_span = X_span, X_correct = X_correct, 
 .         X_span_correct = X_span_correct, batch_ref = batch_ref, 
 .         delta = delta, taus = taus, logistic_lasso = logistic_lasso, 
 .         quantile_type = quantile_type, lambda_quantile = lambda_quantile, 
 .         interplt = interplt)
 . }
3. e$fun(obj, substitute(ex), parent.frame(), e$data)

I'll try to create a toy data set that reproduces the error later or tomorrow.

tybonic commented 1 year ago

After looking at it in more detail, the problem seems more complex to me than I thought initially and it seems to me that there might be multiple reasons for the error.

In my case the solution was quite straightforward, though. ConQuR() ran after setting the class of my covariates as factor using factor():

# create taxa data frame
taxa <- data.frame(
  taxon_1 = c(0, 1, 0, 2),
  taxon_2 = c(1, 0, 1, 1),
  taxon_3 = c(0, 5, 0, 2),
  taxon_4 = c(0, 1, 0, 1),
  row.names = c("sample_1", "sample_2", "sample_3", "sample_4")
)

# create batch IDs
batchid <- factor(c(0, 0, 1, 1), ordered = FALSE)
batchid

# create covars
covar <- data.frame(
    covar_1 = c("A", "A", "B", "B"),
    covar_2 = c("C", "D", "C", "D")
)

# run ConQuR (this produces the error mentioned in the comment above)
taxa_corr_test1 <- ConQuR(tax_tab = taxa, batchid = batchid, covariates = covar, batch_ref = "0")

# set class of covars to factor
covar$covar_1 <- factor(factors_test$factor_1)
covar$covar_2 <- factor(factors_test$factor_2)

# run ConQuR (this runs without the error)
taxa_corr_test2 <- ConQuR(tax_tab = taxa, batchid = batchid, covariates = covar, batch_ref = "0")
robbueck commented 1 year ago

I had the same issue. Converting non-numeric covariates to factors worked for me, too. Maybe this could be added as a fix?