Open bramtayl opened 5 years ago
Oh and all the different joins too
YES! I think that is actually the area where we could add the most value right now to Query.jl.
I have thought a lot about mutate
and to some degree select
, and not at all about the others. Here is my current thinking:
First, I think we should try to implement all the mutate
and select
variants in the front end only. I think it should be feasible that they all end up as @map
calls under the hood, and in that way we actually don't have to add anything to QueryOperators.jl, or do any work on the backends.
Then, I think we could probably as a first step try to add features like that as new functions that manipulate NamedTuples
, so that they can be used from within @map
, before we start to add helper functions like @mutate
and @select
.
I think for starters, if we had a type stable merge
function for NamedTuples
, it would go a long way. Say merge((a=1,b=2),(c=3))==(a=1,b=2,c=3)
. Once we have that, we could add some syntax to {}
to make it easier to use that. For example {a..., b..., x=3}
could be translated to merge(a, b, (x=3,))
in the various Query.jl macros.
Another area would be selecting subsets of columns. We could either have something like startswith((foo1=1, bar=2, foo2=3), :foo)==(foo1=1,foo2=3)
, or something like (foo1=1, bar=2, foo2=3)[startswith(:foo)]==(foo1=1,foo2=3)
. I'm not sure which of these is better. In a query it might look like @map(startswith(_, :foo))
or @map(_[startswith(:foo)])
. I think I like the first one better, but not sure... The second approach would be more in line with this, which probably would also be worthwhile... In general I think we need a lot more features to select columns, but we probably should iterate a bit with various designs?
Maybe as a first step I should create queryverse/NamedTupleHelpers.jl, where we could play with some of these methods, and where they could have their home?
Going through the dplyr manual, I see several functions that might add to query. These include sample, bind_rows, bind_cols, rename, mutate, slice, n, and top_n. I'm not sure if they are all necessary, but some of them might be nice and I could pitch in here.