matiasandina / ggethos

https://matiasandina.github.io/ggethos/
Other
1 stars 1 forks source link

Consider giving access to functions that compute ethograms and return data #15

Open matiasandina opened 1 year ago

matiasandina commented 1 year ago

For example

get_ethogram <- function(data, x, behaviour, sampling_period = NULL){
  if (is.null(sampling_period)){
    cli::cli_alert_warning("`sampling_period` not provided.")
    sampling_period <- min(diff(dplyr::pull(data, {{x}})))
    cli::cli_inform("Sampling period estimated to {sampling_period} using min difference between observations")
  }
  data <- dplyr::select(data, x = {{x}}, behaviour = {{behaviour}}) 
  etho <- data %>% 
    dplyr::mutate(run_id = vctrs::vec_identify_runs(behaviour)) %>% 
    dplyr::group_by(run_id) %>% 
    dplyr::summarise(behaviour = base::unique(behaviour), 
                     xend = dplyr::last(x) + sampling_period, 
                     x = dplyr::first(x), 
                     duration = xend - x,
                     .groups = "keep") %>% 
    dplyr::select(run_id, x, xend, behaviour, duration)
  return(etho)
}

Would return something like

get_ethogram(sleep_behavior, aligned_time_sec, sleep)
! `sampling_period` not provided.
Sampling period estimated to 1.99999999999989 using min difference between observations
# A tibble: 294 × 5
# Groups:   run_id [294]
   run_id      x  xend behaviour duration
    <int>  <dbl> <dbl> <chr>        <dbl>
 1      1  -41.0 1313. Wake          1354
 2      2 1313.  1333. NREM            20
 3      3 1333.  1349. Wake            16
 4      4 1349.  1355. NREM             6
 5      5 1355.  1417. Wake            62
 6      6 1417.  1451. NREM            34
 7      7 1451.  1481. Wake            30
 8      8 1481.  1549. NREM            68
 9      9 1549.  1593. Wake            44
10     10 1593.  1617. NREM            24
# … with 284 more rows
# ℹ Use `print(n = ...)` to see more rows

The user needs to be cautious with the grouping of the data.frame and how they call each function. It would be great to handle the has_x + has_no_x + ... and everything we do for the plot itself inside one function, but maybe a few functions can simplify things and then have a wrapper ?

matiasandina commented 1 year ago

Addressing the grouping issue

get_ethogram <- function(data, x, behaviour, sampling_period = NULL){
  if (is.null(sampling_period)){
    cli::cli_alert_warning("`sampling_period` not provided.")
    sampling_period <- min(diff(dplyr::pull(data, {{x}})))
    cli::cli_inform("Sampling period estimated to {sampling_period} using min difference between observations")
  }

  if(dplyr::is_grouped_df(data)){
    cli::cli_alert_info("Data was grouped by {dplyr:::group_vars(data)}")
    data <- dplyr::select(data, dplyr::group_cols(), x = {{x}}, behaviour = {{behaviour}})
  } else {
    data <- dplyr::select(data, x = {{x}}, behaviour = {{behaviour}}) 
  }

  etho <- data %>% 
    dplyr::mutate(run_id = vctrs::vec_identify_runs(behaviour)) %>% 
    # add to whatever previous layer was there
    group_by(run_id, .add=TRUE) %>% 
    dplyr::summarise(behaviour = base::unique(behaviour), 
                     xend = dplyr::last(x) + sampling_period, 
                     x = dplyr::first(x), 
                     duration = xend - x, 
                     .groups = "keep") %>% 
    dplyr::select(dplyr::group_cols(), x, xend, behaviour, duration)

  return(etho)
}