rbchan / unmarked

R package for hierarchical models in ecological research
https://rbchan.github.io/unmarked/
37 stars 25 forks source link

Factors and data.frames in R 4.0 #179

Closed mikemeredith closed 4 years ago

mikemeredith commented 4 years ago

I hit a problem when testing the code in Kéry & Royle (2016) Applied Hierarchical Modeling vol 1 section 10.9 p.592ff with R 4.0.1 RC. It uses unmarkedFrameOccu and occu, but the issue may apply to other unmarkedFrame* functions.

Everything seemed to work fine (though I haven't checked results vs 3.6.3) until I passed a fitted model to AICcmodavg::mb.gof.test, when I got no applicable method for 'droplevels' applied to an object of class "character".

The list for obsCovs includes time, a 3-column character matrix. Looking at the summary of the umf object, this is converted to a factor in R 3.6.3 but is still character in R 4.0.

I think the issue is the change in the default for stringsAsFactors in data.frame from TRUE to FALSE w.e.f. R 4.0.

Setting options(stringsAsFactors = TRUE) fixes the problem, but elicits a grumpy warning and is not a long-term solution. mb.gof.test then works, but only with parallel=FALSE; presumably the option would have to be set on the workers for it to work in parallel.

It probably only needs addition of stringsAsFactors = TRUE in calls to data.frame. I looked at the source code but couldn't find my way around, so will not do a pull request.

Thanks, Mike

mikemeredith commented 4 years ago

I'm using unmarked 1.0.0 and AICcmodavg 2.2-2.

kenkellner commented 4 years ago

This is a bug in AICcmodavg. Here's the line that needs the adjustment:

https://github.com/cran/AICcmodavg/blob/9bdb2199725f0cf50f2adba09e0a6d265615f1f3/R/mb.gof.test.R#L46

I emailed a fix to Marc a few weeks ago (the package doesn't have a public repository). Also, I'm not sure if AHM uses the MB chi-square test for Royle-Nichols models, but the current function gives incorrect results. I sent a fix for that too.

Also potentially relevant: https://groups.google.com/forum/#!topic/unmarked/x5fxjSRDb1Y

I wouldn't be surprised if unmarked has some stringsAsFactors=FALSE issues lurking, but I haven't found them yet.

mikemeredith commented 4 years ago

Thanks Ken. Sorry I hadn't checked the unmarked forum. But there's still a difference with R 4.0. A toy example, check output for "Observation-level covariates":

getRversion()
[1] ‘4.0.1’
library(unmarked)
set.seed(2020)
y <- matrix(rbinom(30, 1, 0.3), ncol=3)
time <- matrix(as.character(1:3), nrow=10, ncol = 3, byrow = TRUE)
summary(unmarkedFrameOccu(y = y, obsCovs = list(time = time)))

unmarkedFrame Object

10 sites
Maximum number of observations per site: 3 
Mean number of observations per site: 3 
Sites with at least one detection: 7 

Tabulation of y observations:
 0  1 
22  8 

Observation-level covariates:
     time          
 Length:30         
 Class :character  
 Mode  :character  

options(stringsAsFactors = TRUE)  # gives dire warning
summary(unmarkedFrameOccu(y = y, obsCovs = list(time = time)))
unmarkedFrame Object
... [same output omitted]...
Observation-level covariates:
 time  
 1:10  
 2:10  
 3:10  

The last output is what you get with R 3.6.

The model fitting still works, presumably because the coercion is done later, maybe by model.matrix. So not serious, but the new summary output for obsCovs is not very useful.

Regards, Mike

kenkellner commented 4 years ago

Makes sense. I'll work on something to fix summary methods for this situation. I see the need to make sure things are backwards compatible here. I also think it would be good to encourage users to explicitly specify variables as factors outside of unmarked, rather than relying on the automatic conversion of characters to factors. That seems to be me to be more in the spirit of the changes made in 4.0. This probably means changing some of the example code and maybe vignettes.

mikemeredith commented 4 years ago

explicitly specify variables as factors outside of unmarked...

Do you have a neat way to do this? I can't. I can't put a factor into a matrix, and if I construct the necessary data frame it still gets converted back to character.

getRversion()
[1] ‘4.0.1’
library(unmarked)
set.seed(2020)
y <- matrix(rbinom(30, 1, 0.3), ncol=3)
time <- matrix(as.character(1:3), nrow=10, ncol = 3, byrow = TRUE)
str(t1 <- factor(time))        # now a vector
str(t2 <- matrix(t1, ncol=3))  # back to character again
t3 <- data.frame(T1 = factor(rep(1, 10), levels=(c("1", "2", "3"))),
  T2 = factor(rep(2, 10), levels=(c("1", "2", "3"))),
  T3 = factor(rep(3, 10), levels=(c("1", "2", "3"))))
str(t3)  # ok, try this
head(t3)
summary(unmarkedFrameOccu(y = y, obsCovs = list(time = t3)))
unmarkedFrame Object

10 sites
Maximum number of observations per site: 3 
Mean number of observations per site: 3 
Sites with at least one detection: 7 

Tabulation of y observations:
 0  1 
22  8 

Observation-level covariates:
     time          
 Length:30         
 Class :character  # !!!!!
 Mode  :character  

I'm guessing that at some point the matrices/data frames input to unmarkedFrameOccu are converted to vectors then passed to cbind or equivalent. Converting my data frame of factors to vector converts them to character.

Regards, Mike

kenkellner commented 4 years ago

You can supply the obs covs in long format:

y <- matrix(rbinom(30, 1, 0.3), ncol=3)
obs <- data.frame(time=factor(rep(c(1:3), 10)))
umf <- unmarkedFrameOccu(y, obsCovs=obs)
summary(umf)

unmarkedFrame Object

10 sites
Maximum number of observations per site: 3
Mean number of observations per site: 3
Sites with at least one detection: 9

Tabulation of y observations:
 0  1
19 11

Observation-level covariates:
 time
 1:10
 2:10
 3:10

You're right, though, that probably every umf creation method needs to be examined for this issue.

rbchan commented 4 years ago

Hi guys, I don't see any value in having character strings in unmarkedFrame objects. We can't use them for anything. Could we automatically convert them to factors with a warning?

kenkellner commented 4 years ago

I was hoping to avoid it but Mike's examples have convinced me. It does feel like there might be some unexpected consequences, eg related to prediction.

rbchan commented 4 years ago

An alternative would be to throw an error instead of issuing a warning. This would make the user deal with it.

kenkellner commented 4 years ago

Mike's right though that if you want to supply obs covs as a list of matrices/data frames there is no way to supply a factor correctly. I'm not sure all users would figure out to use the long format instead, if they are used to always using the list approach.

rbchan commented 4 years ago

Good point. I vote for the automatic conversion plus warning approach.

On Tue, Jun 2, 2020 at 9:23 AM Ken Kellner notifications@github.com wrote:

Mike's right though that if you want to supply obs covs as a list of matrices/data frames there is no way to supply a factor correctly. I'm not sure all users would figure out to use the long format instead, if they are used to always using the list approach.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rbchan/unmarked/issues/179#issuecomment-637539534, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWRCTO66YZC4U3GKRVTC3RUT4OFANCNFSM4NPTT3PA .