I have tibbles with columns that represent continuous recordings of electrical signals and I often need to sample small chunks of the data to see/plot the signals. These tables are often in the millions of rows (e.g., 100 samples per second x 24 hours of continuous recording, x several replicates of experimental units).
I have adapted slice to perform these slices and obtain chunks. I often use group_by() to get n continuous samples per group. A very simplified function is shown below:
continuous_sample_n <- function(df, n) {
# Ensure the dataframe has enough rows or have another way to deal with n > nrow(df)
if (nrow(df) <= n) {
return(df)
}
# Calculate the maximum start index
max_start_index <- nrow(df) - n
# Randomly choose a start index
start_index <- sample(1:max_start_index, 1)
# Use slice to get a continuous block of n rows starting from start_index
df %>%
slice(start_index:(start_index + n - 1))
}
I don't think this is "package-worthy/proper function" that would work for everyone, I put it mainly here to show a the concept of randomly selecting the first index and then going from start_index to start_index + n. I was thinking that it would be of interest to have a continuous parameter inside slice or a slice_continuous() that would perform this functionality.
In many ways, this function would be equivalent to slice_head() and slice_tail(), but using a random start point instead of the first observation or the last observation. I understand the interest of head and tail might be greater and it justifies their own function, and maybe slice_continuous() is of lesser interest. But I thought it's already almost implemented and would be much more robust than what I can code.
I have
tibbles
with columns that represent continuous recordings of electrical signals and I often need to sample small chunks of the data to see/plot the signals. These tables are often in the millions of rows (e.g., 100 samples per second x 24 hours of continuous recording, x several replicates of experimental units).I have adapted
slice
to perform these slices and obtain chunks. I often usegroup_by()
to getn
continuous samples per group. A very simplified function is shown below:I don't think this is "package-worthy/proper function" that would work for everyone, I put it mainly here to show a the concept of randomly selecting the first index and then going from
start_index
tostart_index + n
. I was thinking that it would be of interest to have acontinuous
parameter insideslice
or aslice_continuous()
that would perform this functionality. In many ways, this function would be equivalent toslice_head()
andslice_tail()
, but using a random start point instead of the first observation or the last observation. I understand the interest ofhead
andtail
might be greater and it justifies their own function, and maybeslice_continuous()
is of lesser interest. But I thought it's already almost implemented and would be much more robust than what I can code.