ropensci / unconf16

rOpenSci's San Francisco hackathon/unconf 2016
http://unconf16.ropensci.org
23 stars 7 forks source link

Transpiler #26

Open smbache opened 8 years ago

smbache commented 8 years ago

I've often thought about a Transpiler for R, like coffeescript for JavaScript. Both to add some syntactic improvements, but also perhaps to enforce certain practices...

jcheng5 commented 8 years ago

I've had to use an ES6 transpiler for a JavaScript project the last couple of weeks, and I have to say, the experience is surprisingly not too bad. Might be more complicated in a REPL-leaning environment like R, but with first class IDE support I could imagine this actually working really nicely.

I have long wished there was a "strict mode" for R where all function arguments are evaluated eagerly unless marked for delayed evaluation. In other words, functions that are declared in a strict mode file would have force() implicitly called on all arguments before the user-provided function body code is evaluated.

Oh, and you get a warning if you don't explicitly specify attr(exact= and read.csv(stringsAsFactors=... :smile:

smbache commented 8 years ago

Is be very interested in such project

gaborcsardi commented 8 years ago

++1 Me, too, please. :)

IDE support is essential.

As is support without an IDE, i.e. when developing/installing packages. I have had some experience in hooking into the R package install process, and modify the code on the fly (to add argument checks, https://github.com/gaborcsardi/argufy).

I always thought that this approach would work very well for a transpiler as well. All the user needs to do is importing the transpiler package (and potentially supply some configuration parameters), the rest is "automatic". A similar hook can be added to devtools::load_all(), so your code is transpiled, whenever you load them via devtools.

jcheng5 commented 8 years ago

@gaborcsardi Transform at install time is very interesting. Arguably, the kind of advantages you can get from a transpiled language would be even more important for package development than for REPL use.

HenrikBengtsson commented 8 years ago

@jcheng5, I like your:

I have long wished there was a "strict mode" for R where all function arguments are evaluated eagerly unless marked for delayed evaluation.

The "unless marked for" is related to Issue #14 on 'Code Inspection with Non-Standard Evaluation (NSE)'. If you have any thoughts/ideas on how such marking would work in practice, please follow up over there.

gaborcsardi commented 8 years ago

The biggest difficulty I see with a transpiler is resisting the new features you (and the community in general) want to introduce. :)

More seriously, the new features that can be parsed with the old parser are kind of trivial, the way we implemented argufy would work. E.g. force()-ing function arguments is easy.

If a modified parser is needed, then it might be harder, but I still think it is possible.

Admittedly, hooking into the install process is quite some hack, and we potentially need to update the transpiler hook every time the base R install process changes. Still, it would be a way of experimenting with language features without changing anything in base R.

It would be exciting to create a prototype!

gmbecker commented 8 years ago

I'm excited about this as well. I have an early prototype for S4 boilerplate code generation (classes, getter/setter methods) that I hope to push onto github before the meeting that would fit well in this space.

I tend towards the opposite approach with respect to how to hook it in though. I think adding a preprocessor hook to the build and/or install process has a lot of appeal. I'll add that as a separate proposal though, as it is somewhat orthogonal to the types of preprocessing we would want to do.

jimhester commented 8 years ago

I have to do similar hooking for the code manipulation in covr in my still in development v2 implementation which adds an onLoad() hook to the installed package lazy loading code to add the tracing calls to the package namespace.

I have also thought about having a use strict (ala perl, javascript) analog for R and have a very early prototype code from a year or so ago.

Definitely interested in this!

gaborcsardi commented 8 years ago

List of ideas for what a transpiler could do. Some of these need a new parser, but again, these are just ideas. :) Not all of them need a transpiler, actually, and I didn't think much about how exactly they could work, if at all.....

Please add more!

smbache commented 8 years ago
gaborcsardi commented 8 years ago

Yeah, sorry, did not think about the merch..... actually, this is good, people will need to get new merch. :)

[a, b, c] would be equivalent to c(a, b, c) I guess.

Unneeded parens, etc. which ones? In if and for for example? Or even in function calls? :) That looks a bit extreme indeed.

Lambda syntax is great! What if it coincides with a right assignment? We just forbid right assignment in R2 code?

smbache commented 8 years ago

Also bit extreme:

package quickpkg:

    #private
    foo x y z: 
        x + y + z

    @export
    bar x:
       foo x 2 3

And maybe partial application would be nice!

gaborcsardi commented 8 years ago

Wow. :) The : notation seems ambiguous I think.

smbache commented 8 years ago

Re [a, b, c] vs c(a, b, c): yes, but should include notation for matrices and perhaps also fads syntax for sequences ...

smbache commented 8 years ago

Could be something else than :... F# uses =, Python uses :...

richfitz commented 8 years ago

I think this is a fun thought experiment at least, and could be super useful.

I wonder though if it's worth resisting the urge to make the language more complex and ambiguous than it already is and go the other way - something that makes R syntax lighter, easier to read, etc, but still easier to do the right thing. That's probably the reason for things like roxygen's success - it seems simpler than raw Rd format. Most of @gaborcsardi's suggestions above fall into this camp really.

That said, I'd love to see something like decorators for functions and a whitespace sensitive/braceless python-style syntax.

smbache commented 8 years ago

Definitely: the idea would be to get:

  1. cleaner syntax, making it even more expressive
  2. Stricter syntax, making it harder to do things in bad ways, and promoting good style
  3. Lose overhead where possible (lightweight)