package scope - Githubissues

moodymudskipper commented 4 years ago

@KKPMW said :

What I would like to see in this package:

in[] in{} in(), etc

out[] (or !in[]) variants

variants that work with tables (so replace values that occur some n of times)

variants that help with "cut" so maybe x %#cut% 5 <- letters[5] would cut x into 5 intervals and name them A-E.

karoliskoncevicius commented 4 years ago

This to me seems like the most important issue to being with.

My suggestion would be to start implementing functionality that absolutely will be in the package. Maybe get one function in near-perfect shape (maybe some %in% variant). Then implement others according to that template (other %in% variants and %in%). And after that see if we can find natural expansions to branch out without deviating from the form/syntax.

If you like the names of %in{}% and %in[]%, etc - we can start with them maybe.

This is open to suggestions/comments of course.

moodymudskipper commented 4 years ago

A first try to define the scope of this package :

It aims to provide infix operators that help detect, subset or replace elements of a vector or list.

output type

It means that ideally all functionalities should have 3 counterparts :

detect : logical output, e.g. %in%, %in[]% or `==` (the former and latter not in the package)

It seems like we've decided to use the in suffix for all detect functionalities.

replace : in place replacement : e.g %in%<-, %in[]%<- or `==<-` .

For replacement not in place we can either use `%in[]<-%`(x, range, value) or replace(x, x %in[]% range, value)

subset : definitive names haven't been decided for those yet, but they would return a subset of the input,

could be :

%subset% / %subset[]% / %subset==% (currently implemented)
%{}in% / %{}in[]% / %{}==%
%{}% / %{}[]% / %{}==%
%intersection_all% / %subset[]% / %subset==% (%intersection_all% is like intersection() but keeps duplicates)/
%vin% / %vin[]% / `%v==%withv` for value

(but better discuss them in naming thread)

action type

we have several collection of operators, with their variants to satisfy the 3 types of outputs :

comparison == etc
range : %in[]% etc
sets : %in% etc
count (%#in% with naming conventions to define)

applied vs atomic

This description doesn't include currently named %in{}% because I don't see actually how it fits yet, or if we should have another set of functions for applied operation

karoliskoncevicius commented 4 years ago

I like the one sentence scope you provided. Thou I would add that they should work on a matrix (or an array) and preserve the dimensions. so that

data.matrix(iris[,-5]) %in[]% c(0,2)

would return a logical matrix, not a vector form like %in% does. This is mainly a selfish need, because all the data I work with is always in a matrix format (not even data.frame...)

I agree with everything you wrote here, except I think I wasn't clear when explaining %in{}%. I tried to expand on that in the "names" issue.

moodymudskipper commented 4 years ago

great then let's make all these functions consstent with matrix lhs, what about data frames ?

could be :

unsupported
supported only if all items of compatible type
just return FALSE everywhere for incompatible types

karoliskoncevicius commented 4 years ago

I would vote for supporting data.frames if possible. Only drop this if it's hard to do.

karoliskoncevicius commented 4 years ago

Took a look at what I was doing with infixer - and I think data.frame should be supported. The main reason being that equality operators (>, ==, etc) support data.frame.

moodymudskipper commented 4 years ago

good point, and iris == 3 returns a matrix. I'll take the comparison operators as a reference for consistency. I'm not sure what i'll do with the assignment versions for these cases though but I'll play around with it and then we can discuss it further here.

I think I should be able to implement all the changes we discussed during the week

karoliskoncevicius commented 4 years ago

Yup, seems like iris == 3 returns a matrix, not a data.frame. Maybe to be consistent our operators also should return a matrix in that case?

moodymudskipper commented 4 years ago

yes I believe they should

moodymudskipper commented 4 years ago

We agreed now that the scope is detect (logical output), subset, replace, matches according to equality, inequality, intersection with a range or regex, using a decently generalisable syntax to welcome additional operators if necessary.

These operators return the same type of data (or warning/errors) than equality and comparison operators do, when applied on flat atomic vectors, matrices, lists or data frames, with the difference that our right hand sides have different restrictions depending on the operator. They also treat NA as equality and comparison operators do, i.e. they keep them (unlike %in%).

Replacement operators are wrapper around our detection operators and replace and are named as the assignment form of the detection operators ( e.g. %in{}% and %in{}<-%), and assignment forms to equality and comparison operators are defined as well. (==<- etc)

Additional ideas are to design additional infix operators to wrap our detection operators in :

which() to get numeric indices
all() or any() to get a logical scalar
sum() to get a count

But all those are considered out of scope for now as it's not clear if they bring enough value.

karoliskoncevicius commented 4 years ago

I tend to agree they do not bring enough value. From all those other variants, I only imagine the subset one brings value. For other instances simply wrapping the result in appropriate function like:

which(x %in{}% c("a", "b"))

seems to be enough.

Let's drop these which all and sum?

moodymudskipper commented 4 years ago

pinning and closing as I think we're good here!

moodymudskipper / inops

package scope #4