jmsigner / amt

37 stars 13 forks source link

random_steps() sapply() error #109

Closed geodevm closed 2 months ago

geodevm commented 5 months ago

I ran into an issue preparing data for a SSF analysis, which I debugged on my own but thought deserved attention. Since this is unpublished data I can't share the data that caused the error, but I can share what caused the error.

This function that I wrote to prep data for analysis is what caused the error:

SSurFer <- function(dat) {
  ssf_dat <- dat %>% 
    make_track(
      gps_utm_easting,
      gps_utm_northing,
      gps_fix_time, 
      animal_id = animal_id,
      species = species,
      sex = sex,
      season = season,
      crs = "epsg:26915"
      ) %>%
    nest(data = -c("animal_id", "species", "sex")) %>% 
    mutate(resample = map(data, function(x) 
      x %>%
        track_resample(rate = minutes(30), tolerance = minutes(5)) %>%
        filter_min_n_burst(2) %>%
        time_of_day(include.crepuscule = FALSE) %>%
        steps_by_burst(keep_cols = "both") %>%
        random_steps() %>% 
        extract_covariates(covariates, where = "both") %>%  
        mutate(cos_ta_ = cos(ta_), 
               log_sl_ = log(sl_)
               ) %>% 
          filter(!is.na(ta_)))
    )  %>%
    dplyr::select(species, sex, animal_id, resample) %>%
    as_tibble() %>%
    tidyr::unnest(cols = resample) %>%
    filter(!is.na(animal_id))
  return(ssf_dat)
}

Most of this function runs fine and can be disregarded, but it can provide some context to what is being done here. There are multiple individuals of multiple species in one GPS collar dataset, and this is a nest--unnest iteration over those animals to fit tracks. The random_steps() line of this function is what causes the error, and it returns it in both this, a simplified example of the prep I'm doing for the analysis, and a more complicated version using a population-level gamma and von Mises distribution.

When run on my dataset (you'll see it referenced as gps), this code returns the following error:

Error in `mutate()`:
ℹ In argument: `resample = map(...)`.
Caused by error in `map()`:
ℹ In index: 12.
Caused by error in `rle()`:
! 'x' must be a vector of an atomic type
Run `rlang::last_trace()` to see where the error occurred.

Running rlang::last_trace() returns:

<error/dplyr:::mutate_error>
Error in `mutate()`:
ℹ In argument: `resample = map(...)`.
Caused by error in `map()`:
ℹ In index: 12.
Caused by error in `rle()`:
! 'x' must be a vector of an atomic type
---
Backtrace:
     ▆
  1. ├─global SSurFer(gps)
  2. │ └─... %>% filter(!is.na(animal_id))
  3. ├─dplyr::filter(., !is.na(animal_id))
  4. ├─tidyr::unnest(., cols = resample)
  5. ├─tibble::as_tibble(.)
  6. ├─dplyr::select(., species, sex, animal_id, resample)
  7. ├─dplyr::mutate(...)
  8. ├─dplyr:::mutate.data.frame(...)
  9. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
 10. │   ├─base::withCallingHandlers(...)
 11. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
 12. │     └─mask$eval_all_mutate(quo)
 13. │       └─dplyr (local) eval()
 14. ├─purrr::map(...)
 15. │ └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
 16. │   ├─purrr:::with_indexed_errors(...)
 17. │   │ └─base::withCallingHandlers(...)
 18. │   ├─purrr:::call_with_cleanup(...)
 19. │   └─.f(.x[[i]], ...)
 20. │     └─... %>% filter(!is.na(ta_))
 21. ├─dplyr::filter(., !is.na(ta_))
 22. ├─dplyr::mutate(., cos_ta_ = cos(ta_), log_sl_ = log(sl_))
 23. ├─amt::extract_covariates(., covariates, where = "both")
 24. ├─amt::random_steps(.)
 25. └─amt:::random_steps.bursted_steps_xyt(.)
 26.   ├─utils::head(...)
 27.   └─base::rle(unlist(sapply(bursts, "[[", "burst_")))
 28.     └─base::stop("'x' must be a vector of an atomic type")

As you can see, the function gets through multiple iterations of the nested tibble before running into this issue on index 12. Like I said, I cannot share the data that caused the issue since it is not yet published, but there were no evident differences between this index of the nested tibble and indices 1 through 11.

I dug into the code of the function random_steps() to find which line is throwing the error, which is line 130 in the random_steps.R file:

start_ids <- c(1, head(cumsum(rle(unlist(sapply(bursts, "[[", "burst_")))$lengths), -1) + 1)

For some reason sapply()'s "user-friendly" implementation, which allows coercion to lists OR arrays, decides to produce an array in this particular index. This breaks the function, throwing an error as soon as unlist() is called within this line. Again, I don't know why it coerces into an array rather than a list on a dataframe with the same structure as the others being iterated through, but the fix was to coerce directly into a list by changing the sapply() in this line of code into a lapply(), as follows:

start_ids <- c(1, head(cumsum(rle(unlist(lapply(bursts, "[[", "burst_")))$lengths), -1) + 1)

That produced exactly what I was hoping for in the function output. Not sure why this is an issue for my dataset, but coercing directly to lists through lapply() rather than sapply() in this function might save someone with a similarly structured dataset the debugging headache I just went through in the future.

jfarr99 commented 4 months ago

Just a quick comment to say that I ran into the same error with random_steps(), and tweaking from sapply() to lapply() worked! Thanks so much, saved me a ton of time!!

bmrishabh commented 2 months ago

I also faced same issue and the change from sapply() to lapply() solved it. Commented here maybe to increase weight to this issue and to thank @geodevm for figuring out a quick solution. I also want to add that I didn't face this issue on a previous version of the package, and got this problem when I updated it in the past couple of weeks or so. I used the same data set for both cases.

samaramanzin commented 2 months ago

I also had a similar issue issue. Adding simplify = FALSE fixed it for me. start_ids <- c(1, head(cumsum(rle(unlist(sapply(bursts, "[[", "burst_", simplify = FALSE)))$lengths), -1) + 1)

jmsigner commented 2 months ago

Thanks you @geodevm , @jfarr99, @bmrishabh and @samaramanzin for reporting. It is now fixed on github with push 4ec116d.