Closed larsentom closed 1 year ago
I'm fairly certain the .before
and .after
arguments do what you want here.
You typically use .before
to extend the window back to N days "before" the current index value, and .after
to extend the window forward to N days "after" the current index value. However, you are absolutely permitted to use a negative .after
value, i.e.:
[14 days before, -7 days after] == [14 days before, 7 days before]
library(tidyverse)
library(slider)
library(lubridate)
set.seed(31415)
lab_results <- tibble(
patient_id = sample(seq(1, 100), size = 10000, replace = TRUE),
lab_date = dmy("1 Jan 2020") + days(sample(seq(1, 365*3), size = 10000, replace = TRUE)),
Hb_g_dL = runif(n = 10000, min = 10, max = 20)
) %>%
distinct() %>%
arrange(patient_id, lab_date)
lab_results
#> # A tibble: 10,000 × 3
#> patient_id lab_date Hb_g_dL
#> <int> <date> <dbl>
#> 1 1 2020-01-15 14.0
#> 2 1 2020-01-18 18.0
#> 3 1 2020-01-23 17.8
#> 4 1 2020-02-02 16.6
#> 5 1 2020-02-03 10.7
#> 6 1 2020-02-18 13.0
#> 7 1 2020-02-23 11.2
#> 8 1 2020-02-28 19.5
#> 9 1 2020-03-02 10.2
#> 10 1 2020-03-10 16.8
#> # … with 9,990 more rows
#> # ℹ Use `print(n = ...)` to see more rows
lab_results %>%
group_by(patient_id) %>%
mutate(
last_Hb_value_from_7_to_14_days_ago =
slide_index_dbl(
.x = Hb_g_dL,
.i = lab_date,
.f = ~ last(.x),
.before = 14, # from 14 days before the current date
.after = -7, # to 7 days before the current date ("-7 after the current date")
.complete = TRUE
)
) %>%
ungroup()
#> # A tibble: 10,000 × 4
#> patient_id lab_date Hb_g_dL last_Hb_value_from_7_to_14_days_ago
#> <int> <date> <dbl> <dbl>
#> 1 1 2020-01-15 14.0 NA
#> 2 1 2020-01-18 18.0 NA
#> 3 1 2020-01-23 17.8 NA
#> 4 1 2020-02-02 16.6 17.8
#> 5 1 2020-02-03 10.7 17.8
#> 6 1 2020-02-18 13.0 NA
#> 7 1 2020-02-23 11.2 NA
#> 8 1 2020-02-28 19.5 13.0
#> 9 1 2020-03-02 10.2 11.2
#> 10 1 2020-03-10 16.8 10.2
#> # … with 9,990 more rows
#> # ℹ Use `print(n = ...)` to see more rows
I'm fairly confident this is what you want, so im going to close. But feel free to add another comment with any follow up comments
Ah! I think I see what you mean!
I had initially tried .after = -14, .before = 7
(i.e., >= -14 days ago, <= -7 days ago) - which led to a range error message.
This way uses .before = 14, .after = -7
and seems to work ok - will experiment a bit more.
I am trying to understand the reason for my confusion.
I think I assumed something like:
whereas it seems like .before and .after are like:
Does this mean that e.g. looking after the event (7-14 days ahead) would mean using something like .before = -7, .after = 14
?
If so, it seems that it would support the kind of range-behaviour required! I only wonder though if there would be a way to make this less ambiguous. For example, it seems like specifying '.after = 7' without any .before specification would mean 'pick all the values from (the current value) up to (7 days from the current value)' - a 7-day period. But intuitively, I would expect this to instead 'pick all the available values from 7 days after the current value' (from 7 days ahead to the last available value). Perhaps it's just me?
I think you should think of it as creating windows that look like:
[.i - .before, .i + .after]
It does exactly that in the codebase somewhere, and then it requires that .i - .before <= .i + .after
to have a sensible window (which led to the errors you saw)
Does this mean that e.g. looking after the event (7-14 days ahead) would mean using something like .before = -7, .after = 14?
Yes exactly
Regarding your confusion about .before/.after, I'm hopeful that it is just you 😄 I think that the documentation for .before
and .after
in ?slide_index
is pretty good at conveying the correct meaning (I'm biased of course, but still). It is also meant to map almost exactly to how it works in SQL with Window functions. i.e. this:
SELECT
shop,
date,
revenue_amount,
AVG(revenue_amount) OVER (
PARTITION BY shop
ORDER BY date ASC
RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
) AS moving_avg
FROM revenue_per_shop;
is very close to this grouped by shop
:
slide_index_dbl(revenue_amount, date, mean, .before = days(1), .after = 0)
Scenario: In clinical data we often want to look at values in a specific time range relative to the current value - for example, the most recent value between 7 days and 14 days ago.
Feature request: A method to allow specification of a relative interval, e.g. with .interval_relative_start / .interval_relative_stop (or some better-named parameters).