Open wallyxie opened 4 months ago
Ok, I've looked into this a little to confirm that .complete
is working as intended. With slide_period_*()
, the .complete
argument does work slightly differently then the rest of slider, but let me try and explain with a simpler example.
slide_period()
works in two steps:
period
using warp::warp_distance()
to define the bins.before
, .after
, and .complete
to the result of warp_distance()
, not the original index itselfSo in the example below of looking at "the current month plus 1 month before it", we first:
warp_distance()
(you can see the result below).complete
is only taken into account during the 2nd bullet point. The way it works is that it asks the question: "Is it even technically possible to have any data in the current month bin AND the previous month bin?". In the result's 1st value, it is literally impossible to have anything in the "previous month bin" because bin 600 is the first one, so there is no previous month bin, so that is considered "incomplete" and you get a NULL
there when .complete = TRUE
.
Note that this is not the case for the 2020-04
bin assigned to number 603. Even though there is no data in the 2020-03 bin, there technically could be because we've seen bins 600 and 601 before it, so bin 602 could theoretically exist and give us a complete window. The way this is implemented ensures that you only see NULL
incomplete bins at the front (or back, if using .after
) of the result set, and not interspersed randomly throughout it. Again, in this case it is only returning NULL
if it is technically impossible to have a complete bin based on the way the arguments are specified, regardless of the data.
library(slider)
i <- as.Date(c(
"2020-01-01", "2020-01-05",
"2020-02-02", "2020-02-04",
"2020-04-01", "2020-04-07"
))
# "the current month, and 1 month before it"
slide_period(i, i, "month", identity, .before = 1)
#> [[1]]
#> [1] "2020-01-01" "2020-01-05"
#>
#> [[2]]
#> [1] "2020-01-01" "2020-01-05" "2020-02-02" "2020-02-04"
#>
#> [[3]]
#> [1] "2020-04-01" "2020-04-07"
# it is literally impossible for the first group to be "complete".
slide_period(i, i, "month", identity, .before = 1, .complete = TRUE)
#> [[1]]
#> NULL
#>
#> [[2]]
#> [1] "2020-01-01" "2020-01-05" "2020-02-02" "2020-02-04"
#>
#> [[3]]
#> [1] "2020-04-01" "2020-04-07"
# a good way to check what's happening is to look at the result of warp_distance(),
# used under the hood. this "chunks" the `i`ndex by `period`, and then `.before`,
# `.after`, and `.complete` are applied to this result
warp::warp_distance(i, "month")
#> [1] 600 600 601 601 603 603
So in the case of your example, it will:
origin
you provide.before
, .after
, and .complete
to the resulting binsSince .before
and .after
are both 0, you've requested it to slide over "just the current 60 day bin". It is technically possible for that to contain a full window of data even in the first result, so it returns the same thing regardless of .complete = TRUE/FALSE
.
I should probably create a small vignette that talks about this example in more detail, as this is definitely one of the more complicated parts of slider, so I will leave this issue open to remind myself to do that, but it is working as intended
Hi @DavisVaughan,
Per this Stack Overflow question, I am experiencing an issue where
slide_period_dfr
produces the same output including partial period calculations regardless of whether.complete
is set to T or F. It looks like at least one other user was able to replicate this.The issue can be replicated as follows:
Running
then produces
as does
where the partial period and its
.f
operations are included.Is this a bug, or does
slide_period_dfr
ignore the.complete
argument?Thank you for your time and attention!