jlaake / marked

R package for analysis of mark-recapture data solely with R
9 stars 9 forks source link

process.data() does not process groups correctly with tibble inputs #16

Open vgherard opened 8 months ago

vgherard commented 8 months ago

Dear Jeff, I just stumbled into this:

tbl <- tibble::tibble(ch = c("1001", "1010", "0110"), 
                     sex = factor(c("M", "M", "F"))
                     )

attempt1 <- tbl |>
  marked::process.data(groups = "sex")
#> Warning: Unknown or uninitialised column: `Freq`.
#> Warning: Unknown or uninitialised column: `freq`.
#> 3 capture histories collapsed into 3
#> Warning: Unknown or uninitialised column: `id`.
#> Error in marked::process.data(tbl, groups = "sex"): 
#>   sex  is not a factor variable

attempt2 <- tbl |>
  as.data.frame() |>
  marked::process.data(groups = "sex")
#> 3 capture histories collapsed into 3

Created on 2024-04-06 with reprex v2.1.0

I traced the error back to process.data.R#L413, that is:

vari = data[, groups[i]]

This kind of subsetting does not produce a factor with tidyverse tibbles.

Thought you may want to know. Thanks for this nice library!

jlaake commented 8 months ago

Thanks for bringing this to my attention. I believe I had the same problem with RMark and I just need to wrap it in as.data.frame to solve this problem. Once I fix, I'll close this issue. Note that BFGS is the default optimization method for some models (can't remember now which) and it doesn't always perform well. When I make this change I'm going to change default to nlminb for optimization. I wrote marked to add models that didn't exist in MARK that I wanted and started with some of the simpler models like CJS and JS as a proof of concept. Also, my other goal was to make the code open source so others could see how these models could be built for their own purposes. However, I actually recommend that folks use MARK with RMark because it is a much more complete software package. You'll see that marked and RMark are very similar with a few differences.