Closed moodymudskipper closed 4 years ago
This to me seems like the most important issue to being with.
My suggestion would be to start implementing functionality that absolutely will be in the package. Maybe get one function in near-perfect shape (maybe some %in% variant). Then implement others according to that template (other %in% variants and %in%). And after that see if we can find natural expansions to branch out without deviating from the form/syntax.
If you like the names of %in{}% and %in[]%, etc - we can start with them maybe.
This is open to suggestions/comments of course.
A first try to define the scope of this package :
It aims to provide infix operators that help detect, subset or replace elements of a vector or list.
output type
It means that ideally all functionalities should have 3 counterparts :
%in%
, %in[]%
or `==`
(the former and latter not in the package)It seems like we've decided to use the in
suffix for all detect functionalities.
%in%<-
, %in[]%<-
or `
==<-` . For replacement not in place we can either use `%in[]<-%`(x, range, value)
or replace(x, x %in[]% range, value)
could be :
%subset%
/ %subset[]%
/ %subset==%
(currently implemented)%{}in%
/ %{}in[]%
/ %{}==%
%{}%
/ %{}[]%
/ %{}==%
%intersection_all%
/ %subset[]%
/ %subset==%
(%intersection_all%
is like intersection()
but keeps duplicates)/%vin%
/ %vin[]%
/ `
%v==%with
v` for value(but better discuss them in naming thread)
action type
we have several collection of operators, with their variants to satisfy the 3 types of outputs :
==
etc%in[]%
etc%in%
etc%#in%
with naming conventions to define)applied vs atomic
This description doesn't include currently named %in{}%
because I don't see actually how it fits yet, or if we should have another set of functions for applied operation
I like the one sentence scope you provided. Thou I would add that they should work on a matrix (or an array) and preserve the dimensions. so that
data.matrix(iris[,-5]) %in[]% c(0,2)
would return a logical matrix, not a vector form like %in%
does. This is mainly a selfish need, because all the data I work with is always in a matrix format (not even data.frame...)
I agree with everything you wrote here, except I think I wasn't clear when explaining %in{}%
. I tried to expand on that in the "names" issue.
great then let's make all these functions consstent with matrix lhs, what about data frames ?
could be :
I would vote for supporting data.frames if possible. Only drop this if it's hard to do.
Took a look at what I was doing with infixer - and I think data.frame
should be supported. The main reason being that equality operators (>
, ==
, etc) support data.frame
.
good point, and iris == 3
returns a matrix. I'll take the comparison operators as a reference for consistency. I'm not sure what i'll do with the assignment versions for these cases though but I'll play around with it and then we can discuss it further here.
I think I should be able to implement all the changes we discussed during the week
Yup, seems like iris == 3
returns a matrix, not a data.frame. Maybe to be consistent our operators also should return a matrix in that case?
yes I believe they should
We agreed now that the scope is detect (logical output), subset, replace, matches according to equality, inequality, intersection with a range or regex, using a decently generalisable syntax to welcome additional operators if necessary.
These operators return the same type of data (or warning/errors) than equality and comparison operators do, when applied on flat atomic vectors, matrices, lists or data frames, with the difference that our right hand sides have different restrictions depending on the operator. They also treat NA
as equality and comparison operators do, i.e. they keep them (unlike %in%
).
Replacement operators are wrapper around our detection operators and replace
and are named as the assignment form of the detection operators ( e.g. %in{}%
and %in{}<-%
), and assignment forms to equality and comparison operators are defined as well. (==<-
etc)
Additional ideas are to design additional infix operators to wrap our detection operators in :
which()
to get numeric indicesall()
or any()
to get a logical scalarsum()
to get a countBut all those are considered out of scope for now as it's not clear if they bring enough value.
I tend to agree they do not bring enough value. From all those other variants, I only imagine the subset one brings value. For other instances simply wrapping the result in appropriate function like:
which(x %in{}% c("a", "b"))
seems to be enough.
Let's drop these which
all
and sum
?
pinning and closing as I think we're good here!
@KKPMW said :