tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.79k stars 2.12k forks source link

Feature Request: Add `continuous` parameter to `slice()` or new `slice_continuous()` #7012

Closed matiasandina closed 6 months ago

matiasandina commented 7 months ago

I have tibbles with columns that represent continuous recordings of electrical signals and I often need to sample small chunks of the data to see/plot the signals. These tables are often in the millions of rows (e.g., 100 samples per second x 24 hours of continuous recording, x several replicates of experimental units).

I have adapted slice to perform these slices and obtain chunks. I often use group_by() to get n continuous samples per group. A very simplified function is shown below:

continuous_sample_n <- function(df, n) {
  # Ensure the dataframe has enough rows or have another way to deal with n > nrow(df)
  if (nrow(df) <= n) {
    return(df)
  }

  # Calculate the maximum start index
  max_start_index <- nrow(df) - n
  # Randomly choose a start index
  start_index <- sample(1:max_start_index, 1)
  # Use slice to get a continuous block of n rows starting from start_index
  df %>% 
    slice(start_index:(start_index + n - 1))
}

I don't think this is "package-worthy/proper function" that would work for everyone, I put it mainly here to show a the concept of randomly selecting the first index and then going from start_index to start_index + n. I was thinking that it would be of interest to have a continuous parameter inside slice or a slice_continuous() that would perform this functionality. In many ways, this function would be equivalent to slice_head() and slice_tail(), but using a random start point instead of the first observation or the last observation. I understand the interest of head and tail might be greater and it justifies their own function, and maybe slice_continuous() is of lesser interest. But I thought it's already almost implemented and would be much more robust than what I can code.

DavisVaughan commented 6 months ago

Thanks, but I think this continuous random slice idea is a little to specialized for dplyr, I appreciate the idea though!