Closed eutwt closed 1 year ago
So I have no idea why this is occurring (or why step_group()
fixes it), but FYI this also happens with other functions.
pacman::p_load(dtplyr, dplyr)
iris %>%
lazy_dt() %>%
select(Species, Sepal.Length) %>%
filter(Sepal.Length <= mean(Sepal.Length),
.by = Species)
#> Source: local data table [82 x 3]
#> Call:
#> _DT2 <- `_DT1`[, .(Species, Sepal.Length), keyby = .(Species)]
#> `_DT2`[`_DT2`[, .I[Sepal.Length <= mean(Sepal.Length)], by = .(Species)]$V1]
#>
#> Species Species Sepal.Length
#> <fct> <fct> <dbl>
#> 1 setosa setosa 4.9
#> 2 setosa setosa 4.7
#> 3 setosa setosa 4.6
#> 4 setosa setosa 5
#> 5 setosa setosa 4.6
#> 6 setosa setosa 5
#> # ℹ 76 more rows
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
This has to be something weird about R's copy-on-modify semantics, but it's way over my head what could be happening. It makes no sense to me that it would alter the object for the prior step.
So basically the error is caused by something weird in R's copy-on-modify semantics. As far as I know, it shouldn't be altering the object for the prior step when using parent$group <-
, but it does. No idea why.
step_group()
fixes it because it uses new_step()
, which is creating a "new" dtplyr_step
object from scratch. So the weird copy-on-modify behavior is skipped.
That being sad - I think we just need to switch out our uses of object$group
throughout the code base to ensure we're not messing this up in other places.
That makes sense. Closing this PR since there's a larger issue
@markfairbanks I don't think it is actually to do with copy-on-modify, the object is only modified within the scope of the function. But, within the scope of the function is all that matters, because that's what's passed to new_step
, whose output determines the data.table expression. So when dt_call
is recursing down the steps to create the expression, the parent
which has been modified gets evaluated by dt_call.dtplyr_step_subset
(in this case, since it's a select step) as a step with groups, because they were added. (But, the output of step_group
is a different class and never evaluated by dt_call.dtplyr_step_subset
)
Ah yep I see what you mean. The issue is still a bit over my head though 😅
I might have some time to revisit in the future, but hopefully #445 covers what we need for now.
Yeah, the fix is the same regardless
Ah great, good to know.
Closes #439
Leaving in draft status for now because I don't understand why this fixes the issue