moodymudskipper / inops

Infix Operators for Detection, Subsetting and Replacement
GNU General Public License v3.0
40 stars 0 forks source link

First CRAN release #9

Closed karoliskoncevicius closed 4 years ago

karoliskoncevicius commented 4 years ago

Maybe we should get a checklist for what functionality to finish before first version release to CRAN? Might also create a milestone for this.

moodymudskipper commented 4 years ago

Sorry for my absence, I have not ditched the project and I'm hoping to start to contribute again in the following days

karoliskoncevicius commented 4 years ago

All is good, no rush. This is our hobby project anyways. I too am working on it only when I have time/ideas.

moodymudskipper commented 4 years ago

@KKPMW please suggest additions / modifications as you see fit

moodymudskipper commented 4 years ago

implemented all combinations of detect subset replace with in and out . still need subset and replace versions of equality and comparison ops. Also need to define aliases.

moodymudskipper commented 4 years ago

it's needed ;)

https://stackoverflow.com/questions/57891929/find-array-index-of-elements-of-a-matrix-that-match-a-value-in-a-vector-of-candi/57914731#57914731

karoliskoncevicius commented 4 years ago

Upvoted :)

I should find more time to contribute to this. Took too long of a break.

moodymudskipper commented 4 years ago

There's not much left to do actually, I don't know why I disconnected a bit, and started to lose myself in other stuff again :). I won't have much time in the next 10 days I think, though maybe this weekend. If you plan to work on something maybe ping in the issue beforehand so I don't do the same on my side.

moodymudskipper commented 4 years ago

I read recently about how ropensci works and I think our package might be eligible for, it would be really nice as we'd get reviews before submitting to CRAN, and more exposure.

https://github.com/ropensci/software-review

Shouldn't we try and submit it there ? @KKPMW

karoliskoncevicius commented 4 years ago

Hmm I am not too familiar with rOpenSci. From what I gathered when looking at it in the past - it's mainly where people put their packages when they no longer want to maintain them. For example Henric Bengtsson has his small package dirdf in rOpenSci, but none of his more serious packages (like future or matrixStats).

Another thing - once transferred to rOpenSci - we would still be left as main maintainers. However I think they have a rule that if you do not address an open issue for more than 3(?) months - they will assign a different person to look over the package. The repository will be transfered to ropenscilabs GitHub account, so they can do what they want with it.

The way I see it - you will simply be transferring over your rights to this package to people of rOpenSci. The review benefits are nice but (1) - they will take a lot of time and (2) - CRAN does not review the code of the package anyway.

I remember looking into this option for my matrixTests package, but decided against, for the reasons mentioned above.

But ultimately it's your choice, and I admit I don't know enough about the benefits rOpenSci gives. What are your thoughts about it?

moodymudskipper commented 4 years ago

I'm not sure if we would be accepted anyway, but it was mainly about visibility. My two other packages on CRAN are stuck 5 to 20 downloads a day, I'd be happy if we could do better. You make good points with the caveats though, especially the time, I'd like to get this out ASAP.

BTW as far as you're concerned is it ready to be shipped to CRAN ?

karoliskoncevicius commented 4 years ago

Yup that's another thing - not sure if we will be accepted. Seems like they are oriented towards science (hence rOpenSCI). But it might be a nice experience getting more familiar with how they work via submitting one package. I am not against it by any means.

Regarding getting users and visibility - yeah, it's tough when we are not tidyverse :) Having more usability would be somewhat helpful here. If we had "fancy" things like ages %in% c(0,5,18,30,60,100) <- c("child", "teen", "young", "adult", "old")` or something of that sort - I think we would get more immediate users. Plus one additional "real world" example in the README might do a lot in this area.

Regarding CRAN - we can probably ship this to CRAN, then share it somewhere (like reddit?) and get feedback if possible. And reiterate based on feedback?

Sorry for being all over the place with the reply.

karoliskoncevicius commented 4 years ago

Added one more example to README in the dev_readme branch. And also change tables to be in html, not code. Will issue a pull request.

One additional nice thing to add would be that %in#% idea (for selecting based on number of occurrences)

moodymudskipper commented 4 years ago

OK I'll read more about Open sci and see if we might fit, and we'll see from there. re ages %in% c(0,5,18,30,60,100) <- c("child", "teen", "young", "adult", "old") I'm still not a fan, ages <- switch(ages,0= "child",5=18...) or recode, or levels<- are existing less confusing options in my opinion.

See my other issue re %in#% (which actually needs to be %#in% in our system).

moodymudskipper commented 4 years ago

Should we do a vignette ? I'm always confused by those, that would be pretty much a copy of the readme in our case right ?

karoliskoncevicius commented 4 years ago

Not sure about vignette - never created one. But might be useful, specially if we can come up with some elaborate real-world example that would be too big for README.

Another thing that might increase the user count a bit - add topics to the repository, like R, r-package, etc. Seems like I cannot do this myself.

moodymudskipper commented 4 years ago

Ah cool I've never done that, I'll check how it works

karoliskoncevicius commented 4 years ago

You should see "Manage Topics" link/button somewhere near description.

moodymudskipper commented 4 years ago

thanks, done

karoliskoncevicius commented 4 years ago

Quick question - do we want commented-out blocks of previous implementation in the files?

I would suggest deleting that - it will be preserved in the git history if needed.

moodymudskipper commented 4 years ago

We can delete it, I left it as I was not completely comfortable with my decision and thought I might quickly reverse it, but I think it was the way to go, and a faster implementation if ever needed might be better done using C or C++ (rather than trying to emulate a behavior consistent with == by dealing with dozens of possible exceptions). But I don't know any C and very little C++ so I'm not so impatient to get to it :).

karoliskoncevicius commented 4 years ago

Shared README with 2 of my colleagues and their reactions were:

1st Liked it a lot. Said it looks like it should be included in base.

2nd Said that the first example should be simpler. He thought that %>% was maybe also coming from this package.

karoliskoncevicius commented 4 years ago

So based on the feedback I think we need to rework the README a bit, before sharing the package. Maybe the introduction of operators should be more gentle, type by type, with the minimal possible example.

moodymudskipper commented 4 years ago

cool to have god feedback!

Maybe first example should be simplified yes. We can define flights2 as flights[c("origin", "dest", "tailnum", "dep_time", "arr_time", "distance")] and then have individual subset() calls rather than filter(). This way no pipe and we don't upset those who don't like the tidyverse too much.

moodymudskipper commented 4 years ago

btw I discovered your package basetheme, looks nice! advertising it on twitter : https://twitter.com/antoine_fabri/status/1191411625628184577

karoliskoncevicius commented 4 years ago

Wow thanks! :) I don't have a twitter account, so cannot return a favour that way...

The package is work in progress. In general it provides a way of creating themes by the users themselves. Or just setting par() settings once, instead of doing it for every plot i.e. basetheme(pch=19).

I tried to add a few themes as well, but only the dark ones seem pretty for now. It's missing a good light theme in my opinion.

karoliskoncevicius commented 4 years ago

Regarding a simpler README - I am trying to come up with something, will see how it goes. But writing good READMEs seems to not be my thing.

karoliskoncevicius commented 4 years ago

Not sure if any better, so not issuing a pull request, but here for your criticism: https://github.com/moodymudskipper/inops/tree/dev_README_again

@moodymudskipper

moodymudskipper commented 4 years ago

I think it needs to be trimmed, because the quantity is a bit overwhelming :

Actually this current README would make a great extended vignette I think.

The flight example can be simplified, because filter supports several expressions, this will avoid pipe overload:

flights %>%
  filter(dep_time %in()%  c(1200, 1700)) %>%
  filter(arr_time %in()%  c(1200, 1700)) %>%
  filter(dest     %out%   c("LEX", "PSP", "HDN")) %>%
  filter(distance %out[]% c(100, 3000)) %>%
  filter(tailnum  %in~%   c("^N1", "^N3")) %>%
  select(origin, dest, tailnum, dep_time, arr_time, distance)

becomes :

filter(
  flights,
  dep_time %in()%  c(1200, 1700),
  arr_time %in()%  c(1200, 1700),
  dest     %out%   c("LEX", "PSP", "HDN"),
  ...)

This example would showcast mainly operators that haven't been shown before, but which are not surprising given the explanations we gave right above.

What do you think ?

karoliskoncevicius commented 4 years ago

Yeah I agree. Specially with the filter example and turning that README into vignette.

I also think we need something hard-hitting to lead with. Like maybe a picture that illustrates the main point of the package.

moodymudskipper commented 4 years ago

Maybe a table with detect subset replace as columns and set range regex # as rowa and one extremely compact example in each cell?

I'd use it when sharing on twitter, maybe with another picture detailing how base comparison and %in% are extended, which may or may not be in the readme as well

karoliskoncevicius commented 4 years ago

@moodymudskipper vignette is in the dev branch - turned the README into vignette. Will see how to simplify the README now.

karoliskoncevicius commented 4 years ago

Good idea. Would not be able to show the results in the cells I think thou - will not have enough space.

moodymudskipper commented 4 years ago

I think it might work even with the results

detect subset replace
set 1:3 %in{}% c(2,4)
#> [1] FALSE TRUE FALSE
1:3 %[in{}% c(2,4)
#> [1] 2
x %in{}% c(2,4) <- NA; x
#> [1] 1 NA 3
range 1:3 %in(]% c(2,4)
#> [1] FALSE FALSE TRUE
1:3 %[in(]% c(2,4)
#> [1] 3
x %in(]% c(2,4) <- NA; x
#> [1] 1 2 NA
regex c("foo", "bar", "baz") %in~% c("o$", "z$")
#> [1] TRUE FALSE TRUE
c("foo", "bar", "baz") %[in~% c("o$", "z$")
#> [1] "foo" "baz"
x %in~% c("o$", "z$") <- NA; x
#> [1] NA bar NA
table c(2, 2:3) %in#% 1
#> [1] FALSE FALSE TRUE
c(2, 2:3) %[in#% 1
#> [1] 3
x %in#% 1 <- NA; x
#> [1] NA NA 3
karoliskoncevicius commented 4 years ago

What do you think about the version currently in the https://github.com/moodymudskipper/inops/tree/dev_README_again ?

moodymudskipper commented 4 years ago

ah, I didn't realize you were on it :). I think it looks amazing! I'm just not sure about the commas in the result, my first gut reaction was "syntax error!". But I love the colors and layout and think it's an excellent summary. Also there's a bit of an inconsistency because there's no greay background for first row results but there is on subsequent rows.

moodymudskipper commented 4 years ago

There's a leftover [ at the start of the second replacement operator, which should be removed

karoliskoncevicius commented 4 years ago

Reworking it now to match your table layout - I think having 3 columns instead of 4 and putting types on rows is better. Will update soon.

karoliskoncevicius commented 4 years ago

@moodymudskipper how is it now?

karoliskoncevicius commented 4 years ago

Saw your twitter message about unmatched brackets.

I am using vim and I get this automatically:

Screenshot 2019-11-05 at 2 19 06 PM

If R-studio allows adjusting the syntax parsing, then it should simply be a matter of writing a parsing rule to detect these and then coloring them red.

moodymudskipper commented 4 years ago

Thanks! it seems I really should try VIM, today copy and paste stopped working, even after a full restart... You should get on twitter, personally I like it better than reddit because all the cool kids are there, the price is the memes, the cat and kids stories and the politic opinions but overall it's worth it.

moodymudskipper commented 4 years ago

The new readme looks great! It seems it was worth it to torture ourselves. Are you satisfied ?

karoliskoncevicius commented 4 years ago

Yup I like it. Issued a pull request, so if we both agree it can be merged.

Not sure if vignette is satisfactory or not. Probably could be expanded. But I don't have motivation to work on it... Anyway, a lot of examples and explanations are there, I am sure if people will look at it - they will find it useful.

karoliskoncevicius commented 4 years ago

Regarding twitter - will keep that in mind. I am trying to minimise my social media use actually. Deleted facebook a few years ago and it was load of my mind. For one reason or another I waste a lot of time on those sites when I have an account there... So a bit reluctant to try twitter. They allow reading posts without having an account, so maybe no harm in not having one :)

But yes, I noticed that every star is there - from Hadley, to Matt Dowle, to Dirk...

moodymudskipper commented 4 years ago

I believe there's a need for a sentence explaining the consistency with comparison ops, maybe no example is necessary.

Totally understand the lack of motivation for the vignette! you've been going hard at it these last days. I'm myself taking a little break off code outside of work these days as I'm getting a bit burnt out. I'll take a look at the vignette, but it doesn't need the same standard of quality anyway.

moodymudskipper commented 4 years ago

I'm guilty of losing too much time on twitter, so good point and I won't try to convince you more then ;).

karoliskoncevicius commented 4 years ago

The explanation about behaviour is in the vignette now. So I thought maybe that is enough. Do you think we should bring it back to README?

karoliskoncevicius commented 4 years ago

If all is good, I think we can send it to CRAN as the initial version.

What do you think?

moodymudskipper commented 4 years ago

yes, I believe it's a key feature of the package, but I'm not entirely happy with the current wording.

The first mention is:

Work with the same values as %in% does but provide a more consistent behaviour for data.frames and NA values.

Not quite right as I don't think we can accuse %in% of not being consistent, it just works differently compared to comparison ops

The second mention is:

The operators implemented here try to be consistent with the default comparison operators like == and <.\

Which is about right but not specific enough.

These consistency rules haven't been 100% clear to me tbh, leading to some confusion in these issues or as I was writing the tests, so I'd like to set it straight.

Here is a tentative definition, not readme ready but rigorous enough I think :

The detection operators are to be seen as extensions of the base comparison operators rather than extensions of the %in% function, in the sense that they support the same object types on the left hand side and return the same object type in these cases. For instance they always return a logical matrix if the left hand side is a data.frame, and always NA if the lhs or rhs is NA. The subsetting and replacing operators are simple wrappers so that x %[in()% interval and x %in()% interval <- value are respectively strictly equivalent to x[ x %in()% interval] and x <- replace(x, x %in()% interval, value).

karoliskoncevicius commented 4 years ago

Maybe we should put those explanations at the end in the NOTES section?

moodymudskipper commented 4 years ago

Before CRAN we need unit tests for new functions and add (if we agree) %[>% etc to the package.