malucalle / selbal

selbal: selection of balances for microbial signatures
32 stars 15 forks source link

zCompositions error when using own data #2

Closed adamsorbie closed 4 years ago

adamsorbie commented 6 years ago

selabl.cv() works fine with the example data, however when attempting to analyse my own: data: I receive the following error:

Error in if (any(X2[i, z] > colmins[z])) { : missing value where TRUE/FALSE needed

I had a look at the traceback and the error appears to occur when the cmultRepl function of the zCompositions is called.

terrimporter commented 5 years ago

I've seen the same error with my own data. Did you ever figure this one out?

UVic-omics commented 5 years ago

This may happen when you have taxa with many null values. The cmultRepl() function implements a Bajesian multiplicative replacement which does not work well if the percentage of zeros for a particular taxon is very high. So, keeping those taxa which are present in at least the 20% of your samples I think you will avoid the error. Otherwise, if you do not want to remove any taxa I suggest you to use the other alternative for zero replacement; that is, to add one count to each cell in your abundance matrix.

Thank you,

as-garciav commented 5 years ago

Hello, I've run into the same problem, after defining the optimal number of variables the algorithm stops with several warnings, the main one is:

Error in if (any(X2[i, z] > colmins[z])) { : missing value where TRUE/FALSE needed

I ran the command with the zero replacement "one" otherwise it stops at the zero replacement step Comparison <- selbal.cv(x, y, zero.rep = "one") The taxons in my dataset can be present in just one sample or in nearly all samples. I understand those are a lot of zeros to deal with, but then it is possible to run selbal.cv? How to do it?

Other warning messages include: In addition: Warning messages: 1: In UseMethod("depth") : no applicable method for 'depth' applied to an object of class "NULL" 2: In min(x, na.rm = T) : no non-missing arguments to min; returning Inf

sgavril commented 4 years ago

Still getting the same error. When I filter for taxa present in at least 20% of the samples once and use the Bayesian multiplicative treatment, the function identifies the number of optimal parameters as 3 and errors out. When I use the unfiltered data set with zero.rep = "one", it errors out after the number of optimal parameters is identified as 2. Error below:

Error in if (any(X2[i, z] > colmins[z])) { : missing value where TRUE/FALSE needed

The command I ran was: cv <- selbal::selbal.cv(family, temps, zero.rep = "one")

selbal.zip

I can't seem to figure out why the command will not finish execution. Is something wrong with my inputs?

Edit: for those who are having the same issue, it seems that despite setting the zero.rep parameter to "one", the Bayesian multiplicative treatment remains the default behavior, and the error arises from cmultRepl2.

plmeta commented 4 years ago

Hello,

I tried several different approaches as well. Got the same trouble as Sgavril. Has anyone had any success?

sgavril commented 4 years ago

Hello,

I tried several different approaches as well. Got the same trouble as Sgavril. Has anyone had any success?

I forked the repo and changed the default behavior in all functions that use the zero.rep to "one" rather than the Bayesian treatment and it worked for me. Feel free to try that out.

plmeta commented 4 years ago

Hello, I tried several different approaches as well. Got the same trouble as Sgavril. Has anyone had any success?

I forked the repo and changed the default behavior in all functions that use the zero.rep to "one" rather than the Bayesian treatment and it worked for me. Feel free to try that out.

Smart! I did it and worked. Thanks!