rbchan / unmarked

R package for hierarchical models in ecological research
https://rbchan.github.io/unmarked/
36 stars 25 forks source link

formatLong no longer corrupts covariates when factors present; unit test added #107

Closed adamdsmith closed 7 years ago

adamdsmith commented 7 years ago

When using csvToUMF to convert long data.frames (i.e., long = TRUE), the presence of a factor variable resulted in all other variables converting to factors. This is now fixed (with the caveat noted at the bottom). Additionally, the function now tries to automatically identify and extract site covariates and add them to the UnmarkedFrameOccu.

  library(unmarked)
#> Loading required package: reshape
#> Loading required package: lattice
#> Loading required package: parallel
#> Loading required package: Rcpp
  test <- expand.grid.df(expand.grid(site = LETTERS[1:4], julian = c(13, 20, 26)))
  test <- test[with(test, order(site, julian)), ]

  set.seed(42)
  test <- within(test, {
    obsfac = factor(sample(LETTERS[1:2], nrow(test), replace = TRUE))
    sitefac = factor(round(as.numeric(site)/5))
    ocov = round(rnorm(nrow(test)), 2)
    scov = 2 * as.numeric(test$site)
    y = rbinom(nrow(test), 1, 0.6)
  })

  str(unmarked:::formatLong(test, type = "unmarkedFrameOccu"))
#> Formal class 'unmarkedFrameOccu' [package "unmarked"] with 5 slots
#>   ..@ y       : 'matrix' int [1:4, 1:3] 1 0 1 0 1 1 0 0 0 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : NULL
#>   ..@ obsCovs :'data.frame': 12 obs. of  3 variables:
#>   .. ..$ ocov      : num [1:12] 1.51 -0.09 2.02 -0.06 1.3 2.29 -1.39 -0.28 -0.13 0.64 ...
#>   .. ..$ obsfac    : Factor w/ 2 levels "A","B": 2 2 1 2 2 2 2 1 2 2 ...
#>   .. ..$ JulianDate: num [1:12] 13 20 26 13 20 26 13 20 26 13 ...
#>   ..@ siteCovs:'data.frame': 4 obs. of  2 variables:
#>   .. ..$ scov   : num [1:4] 2 4 6 8
#>   .. ..$ sitefac: Factor w/ 2 levels "0","1": 1 1 2 2
#>   ..@ mapInfo : NULL
#>   ..@ obsToY  : num [1:3, 1:3] 1 0 0 0 1 0 0 0 1

As a heads-up, though, the reshape::recast function used in many of these format... functions does not play well with factors, and it may be necessary to rethink this approach moving forward. For example, if the first variable recast encounters is a factor, bad things happen without an easy solution other than reorganizing your data.frame so that's not the case.

  library(unmarked)
#> Loading required package: reshape
#> Loading required package: lattice
#> Loading required package: parallel
#> Loading required package: Rcpp
  test <- expand.grid.df(expand.grid(site = LETTERS[1:4], julian = c(13, 20, 26)))
  test <- test[with(test, order(site, julian)), ]

  set.seed(42)
  test <- within(test, {
    ocov = round(rnorm(nrow(test)), 2)
    scov = 2 * as.numeric(test$site)
    obsfac = factor(sample(LETTERS[1:2], nrow(test), replace = TRUE))
    sitefac = factor(round(as.numeric(site)/5))
    y = rbinom(nrow(test), 1, 0.6)
  })

  str(unmarked:::formatLong(test, type = "unmarkedFrameOccu"))
#> Warning in `[<-.factor`(`*tmp*`, ri, value = c(2, 2, 2, 4, 4, 4, 6, 6, 6, :
#> invalid factor level, NA generated

#> Warning in `[<-.factor`(`*tmp*`, ri, value = c(2, 2, 2, 4, 4, 4, 6, 6, 6, :
#> invalid factor level, NA generated

#> Warning in `[<-.factor`(`*tmp*`, ri, value = c(2, 2, 2, 4, 4, 4, 6, 6, 6, :
#> invalid factor level, NA generated
#> Formal class 'unmarkedFrameOccu' [package "unmarked"] with 5 slots
#>   ..@ y       : 'matrix' int [1:4, 1:3] 1 0 1 0 1 1 0 0 0 1 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : NULL
#>   ..@ obsCovs :'data.frame': 12 obs. of  1 variable:
#>   .. ..$ obsfac: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 2 2 1 2 ...
#>   ..@ siteCovs:'data.frame': 4 obs. of  4 variables:
#>   .. ..$ sitefac   : Factor w/ 2 levels "0","1": 1 1 2 2
#>   .. ..$ scov      : num [1:4] NA NA NA NA
#>   .. ..$ ocov      : num [1:4] NA NA NA NA
#>   .. ..$ JulianDate: num [1:4] NA NA NA NA
#>   ..@ mapInfo : NULL
#>   ..@ obsToY  : num [1:3, 1:3] 1 0 0 0 1 0 0 0 1