markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
450 stars 32 forks source link

Display data.table translation #483

Closed GitHunter0 closed 2 years ago

GitHunter0 commented 2 years ago

Hi @markfairbanks , is there a way to make tidytable display the translation to data.table code as dtplyr does?

For example, this:

library(data.table)
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
mtcars2 <- lazy_dt(mtcars)
mtcars2 %>% 
  filter(wt < 5) %>% 
  mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
  group_by(cyl) %>% 
  summarise(l100k = mean(l100k))

shows the translation below:

#> Call:   `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)),  keyby = .(cyl)]

Thank you

markfairbanks commented 2 years ago

I don't know if this would be possible at this point to be honest. dtplyr is built in a way that the translation is built first and only evaluates when the user wants it to. tidytable uses eager evaluation.

Another thing is that some tidytable functions are built in a non-data.table way because they're faster (pull.(), uncount.(), bind_cols.(), and unnest.() to name a few). tidytable also uses shallow copies instead of deep copies wherever possible which is very different from basic data.table.

Is there a reason why you're looking for this? I feel like dtplyr does a pretty good job at it.

GitHunter0 commented 2 years ago

Thank you for the response, I'm just asking because it is a useful feature to make sure tidytable is doing the expected operations and also to educate people which have limited knowledge on data.table. Indeed dtplyr does that pretty well but tidytable seems to be always ahead, comprising more verbs. However, at some point dtplyr should catch-up. Feel free to close the issue.

markfairbanks commented 2 years ago

it is a useful feature to make sure tidytable is doing the expected

I do have about 1100 tests that make sure tidytable functions behave like tidyverse functions. They run every time I make an update to tidytable. I know that's not visible to the user but they are there.

also to educate people which have limited knowledge on data.table. Indeed dtplyr does that pretty well but tidytable seems to be always ahead, comprising more verbs.

I think part of what holds dtplyr back from having more verbs is the lazy workflow. It's hard to make things work like dplyr/tidyr sometimes if data.table doesn't have a good translation. unnest() is a good example - with eager evaluation tidytable can do some extra checks (e.g. is the user unnesting a list of data frames or a list of vectors?), whereas dtplyr would rely on data.table having a perfect translation of unnest() (which it doesn't at the moment). That's one reason why tidytable has more functions.

dtplyr is also held back from implementing functions that aren't S3 generics - bind_cols/bind_rows can't be implemented with how R currently implements OOP. tidytable doesn't have those same constraints.

I think that's why tidytable still has a good place in the R ecosystem - with eager evaluation it has a big advantage over dtplyr. The downside to eager evaluation is that printing the translations would be much harder.

I'll close this for now - but if you're ever curious how something is translated let me know.

GitHunter0 commented 2 years ago

@markfairbanks , thanks a lot for the detailed explanation.