tpq / propr

[OFFICIAL] [CoDA] An R package to calculate proportionality and other measures for compositional data
67 stars 9 forks source link

Error: posthoc reports finding negative measurements where there is none #12

Open tormolle opened 4 years ago

tormolle commented 4 years ago

Heya, Thom

I've been using your package for a while and recently tried the newly implemented posthoc function, but it threw me an error:

Error in if (any(counts < 0)) stop("Data may not contain negative measurements.") : missing value where TRUE/FALSE is needed

The raw data table used in the analysis is too large to post here, but I believe the procedure is straightforward. The data table is an OTU table of counts, with no negative, infinite or NA values, and only numeric columns. The following snippet describes my pipeline: otus.cmR <- zCompositions::cmultRepl(otus) # stores counts as proportions otus.propd <- propd(otus.cmR, group = map$diag_zone) # diag zone is a vector of character strings indicating the diagenetic zone of each sample, with six different states. otus.propd <- setActive(otus.propd) otus.propd <- updateCutoffs.propd(otus.propd, cutoff = c(0.05, 0.95, 0.3)) otus.propd <- posthoc(otus.propd) # This is where the error arises.

I checked the output of the cmultRepl function, and there are neither negative nor invalid values; I guess the successful execution of the code lines above the post hoc commant all testify to this.

Best, Tor Einar

tpq commented 4 years ago

Thanks Tor. I have a few questions...

1) Could you run: updateF(otus.propd) and let me know if this reproduces the error?

2) Is map$diag_zone a factor or a character? (I don't know why one would work vs. the other, but make sure it is a character so I can rule out that as the cause)

3) Could you confirm that the following snippet works?

data(iris)
x <- iris[,1:4]
y <- iris[,5]

library(propr)
otus.propd <- propd(x, group = y)
otus.propd <- setActive(otus.propd)
otus.propd <- updateCutoffs.propd(otus.propd, cutoff = c(0.05, 0.95, 0.3))
otus.propd <- posthoc(otus.propd)

If not, update with devtools::install_github("tpq/propr")

tormolle commented 4 years ago
  1. updateF(otus.propd) works.
  2. class(map$diag_zone) gives "character".
  3. The snippet works.

One thing I forgot to add in the initial post, which I believe is insignificant, is that the propd object that's created is stored as a list element as follows: cmp$propd <- propd(cmp$otus, group = cmp$map$diag_zone) I'm sorry if leaving this information out is what has caused the trouble.

Edit: I updated the package with devtools::install_github("tpq/propr"). Previously, the function message on ANOVA being run appeared before the error, but this is not the case anymore. The error persists.

tpq commented 4 years ago

I admit I'm a bit stumped without de-bugging the data first hand. Do you feel comfortable sharing the OTU and labels (as separate CSVs) via private message?

tormolle commented 4 years ago

I did another check of the mapping file that went into propd. I see that it contains some NAs. I don't know if that could help by pointing towards possible NA occurrences in the resulting propd object, for instance if samples labelled with NA are somehow treated as a group or if the mapping vector is coerced into character for the analysis but saved in its initial state beforehand, and this initial state is then re-used in posthoc? Just thinking out loud. Nevertheless, it seems weird to me that this would pass through all the other functions without problem.

I'll send you the current data via private message.

tormolle commented 4 years ago

I did an additional check for my data and saw that all theta values were 1. Weird. I tried rerunning the initial snippet, except for the updateCutoffs.propd function, with the updated mapping vector, now without NA values, and it's currently not throwing me any errors where it used to. I'll update this thread again when posthoc has finished running - i don't know how long that's expected to take for my data.

In summary, it seems the mistake is on me for feeding propd NAs in the group vector. I'd suggest letting the propd function throw a warning or error in the future if this turns out to be is the case, though.

Update: It worked without error.

tpq commented 4 years ago

posthoc() does do something different than all other functions... it tries to parse the group vector into a set of pairwise contrasts. Something about the NAs must cause this to fail loudly, while the NAs might cause the other methods to fail silently? I'll be sure to add an explicit check in the next version!!

tpq commented 4 years ago

(By the way, when working with propd, you can use updateF to get exact p-values. This is much, much faster than the updateCutoffs approach.)