Open moodymudskipper opened 4 years ago
Or just have a special j()
function which creates a tb_join_specification
object which basically will trigger a call tosafejoin::eat()
so j()
behaves like eat()
without the first arg.
This way we'll get all additional features and limit the weird syntax and reparsing, but we add a special function and add brackets.
The last example would become :
df1[,, j(df2, some_col, by = c(by_col1 = "by_col2"))]
# rather than
df1[,, ~df2 ~ "some_col" ~ c(by_col1 = "by_col2")]
Another thought :
Maybe trying to cram complex joins in tb is not the right approach as a complex join probably deserves its explicit specific call and we're probably not going to be better than dplyr, fuzzyjoin, data.table and safejoin combined.
What about being more limiting/ do something different instead ? The semi join in i would fail if subsetting several times the same row, the joins in ... would fail if duplicating entries of the lhs, and we could aggregate on the fly as done in safejoin.
df2 would be aggregated by by
columns and an optional aggregation would be operated (which could be list
if we want to keep the duplicates to expand after)
Another idea, a bit crazy but might be intuitive in practice :
tb
subsetting to data.frames given in ...
, which means we don't evaluate right away [
with data frames as first args [
relative to the join These arguments would be :
on
(default to common columns by default)conflict
(as in safejoin, function to deal with conflicting columns)agg
(as in safejoin, a function to deal with duplicate matches, using list
will be equivalent to a nest_join )We'd have the benefit of doing fast reshaping with tb syntax AND we get to keep data frame at the front and arguments behind.
band_members %>% left_join(band_instruments)
band_members %tb>% .[,,band_instruments]
band_members %>% left_join(band_instruments, by = "name")
band_members %tb>% .[,,band_instruments[on = "name"]]
band_members %>% left_join(band_instruments2, by = c("name" = "artist"))
band_members %tb>% .[,,band_instruments[on = c("name" = "artist")]]
band_members %tb>% .[,,band_instruments[{artist} := "name"]] # equivalent
It's harder to spot back later that it's a join though, while it's clear enough with j
.
We can have an hybrid approach :
j(df)
if we just want to make it explicit[.tb
to its first arg (with no additional args)eat
's ...
as we have better subsetting in [.tb
eat
Could be lj
rather than j
to make it a bit clearer that its "left"
as we don't allow unlabelled input in
...
so far (unless spliced), these could be used for joins. However we need to be able to specify the joining columns, which we might to after a~
.It works well for joins as they're usually performed, but if you want to join only certain columns you must repeat the by columns :
I think this is awkward. A way would be to use another
~
separator in that case :This join would give a nested result however if the join is not one on one, because our rule is that we don't change the number or rows if we mutate, so we need yet another
~
to expand , and in the worst case we get something like :which is the equivalent of :
Alternately we could have the
by
argument of the join in an "on" clause as in data.table but it means only one join per call, havingthe argument further from its use and adding an argument that is not really necessary.