s-u / rJava

R to Java interface
https://RForge.net/rJava
233 stars 77 forks source link

Feature - codegen R package from annotated Java #74

Open jdimeo opened 8 years ago

jdimeo commented 8 years ago

Do you know of any existing package or library that would facilitate easy generation of "CRAN-ready" R packages from Java source?

Say I had a class that exposed some functionality that I wanted to easily call from R. I'm envisioning something where I would only have to annotate the class' method with: @RFunction and then on a subsequent run of some utility or tool: make-r-package it would output a directory structure suitable for using via install.packages, . Bonus points if the documentation PDF was automatically compiled from the Javadoc.

This might be out of scope for rJava, but I was thinking this was the best place to ask.

robchallen commented 3 years ago

I've been working on this.

https://github.com/terminological/r6-generator-maven-plugin

Most of the documentation is in the examples package

https://github.com/terminological/r6-generator-maven-plugin-test

(N.B. It's under active development so expect things to not work or change seemingly randomly.)

I've got a few questions about rJava itself though - is there anyone who can talk?

jdimeo commented 3 years ago

@robchallen I will definitely be trying this out. I'm a huge fan after reading the README for 30 seconds....

robchallen commented 3 years ago

@jdimeo great (I'll keep my fingers crossed it works) please drop me with any issues and I'm taking pull requests (particularly around documentation / FAQs / etc.) and looking for collaborators.

s-u commented 3 years ago

@robchallen the author/maintainer? ;)

jdimeo commented 3 years ago

@s-u what are your thoughts on this idea? are you aware of any past efforts beyond @robchallen 's nascent plugin?

s-u commented 3 years ago

@jdimeo No, I'm not. However, this shouldn't be hard in principle - since annotations are accessible via at the reflection API in Java, you can access them from R (via rJava). So the main work IMHO would be to define what you actually want to do. You can annotate classes, methods etc. so how should that be reflected in the package? What should the annotations provide? I could probably add some support in rJava that makes it easier to access the annotations, so you don't need to make all the calls to java.lang.reflect.AnnotatedElement but that's not the hard part.

jdimeo commented 3 years ago

I think this is about going the other way. Taking an API that was originally written in Java, annotating to make it an R API, and code generating the R API that calls back to Java via rJava. Annotations would only be needed during build/compile since they would generate the R package, R itself wouldn't need to access the annotations. @robchallen 's README has some good examples.

While I have your attention - Thanks for your work on rJava/jri/Rserve over the years. In our Java-based streaming event engine, we're passing massive time series off to R to compute tsfeatures using multiple Rserve sessions for a predictive maintenance application.

s-u commented 3 years ago

@jdimeo I think I was talking about the same thing ;) Maybe the confusion comes from my unstated assumption that it is R code that generates the R package, that's why I was talking about R needing to read the annotations. IMHO it would make sense since the tool needs to generate R package and R code - R already knows how to do that.

I'm not sure the approach robchallen took make sense to me - we already have the R objects with the classes and instances, so using R6 seems like double-wrapping, but then I never understood why anyone would use R6 (it's like trying to write Java-syntax in R which is really bad idea IMHO [other than on Java objects directly], since you get the drawbacks of both worlds and no benefits). That's why I was saying the main task would be to define what you want the resulting package to look like - how do you declare generics/functions etc. - it's two different worlds so it's not easy to express the one in terms of the other automatically. Some examples of what the use-cases would be could help start the discussion...

robchallen commented 3 years ago

Hi. Is there anything specific I can help with here?

The R6 wrapper approach made sense to me as the easiest way to hide the rJava mechanics from both the R user and the Java user, but you either like R6 or you don't :-). Probably double-wrapping understates what the generated R6 code is doing as it is also managing creation of the rJava environment, logging, and documentation, type checking, etc.

Given I'm aiming to support Java developers create an R library it also made sense to me for the build process to be in Maven rather than in R, (and I have more experience with Java codegen). I'm assuming limited knowledge of R package structure for the Java developer.

A lot of the effort in this is actually in managing a consistent set of documentation for both R and Java side. When I get to updating this I will maybe think about using ROxygen2 tags on the R6 classes and triggering the R documentation cycle from Java, but that would creates a set of dependencies on the R environment on the developers machine, which would be tricky to enforce.

jdimeo commented 3 years ago

Yes, I had assumed and agree with @robchallen that this is "heavier" on the Java side than the R side. The use cases would be when you've built a Java API that you want to expose to some R users, so all you need is a thin R "veneer" to call into the Java API. In that case, having this be a Maven plugin and doing the R package structure from Maven/Java makes the most sense to me.

One example would be Apache Tika, for content/text extraction from documents. There exists a specific R interface for this: https://github.com/ropensci/rtika @robchallen 's plugin would be able to do this in a more generalized way. I could build up a Java API that called Tika APIs for document parsing, drop a few @RMethod annotations on my entry points, and then via Maven I would have a similar R package to give to R users, but I could develop and manage the API as a Java developer.

jdimeo commented 3 years ago

@robchallen I really want to contribute to your efforts but don't have anything that would use this on the immediate horizon.

Perhaps your plugin could gracefully degrade - do what it could in a "plain" Maven environment as far as generating the R source and DESCRIPTION and such, but also do ROxygen if environmental things are detected (R_HOME is set, R is on the path, etc.). Then we could also provide Docker images that have the right build environment to be able to build the "complete" package.

s-u commented 3 years ago

@robchallen I don't see how the initialisation etc has anything to do with the double-wrapping, since that is done by the package and the constructor, not R6. R6 is just a wrapper around environments, because they are the only mutable structure in R, but you already have mutable objects, hence I don't see why you need to double-wrap, you could just subclass.

That aside, I see your concept is for a package for Java users (so essentially middleware), so in that case it makes sense to do minimal mapping, but from @jdimeo's post I was originally thinking of a real R package, since annotations could be used to define proper R API, that's why I was saying it needs more important work in the design since you have to map R concepts to the underlying Java objects/methods and so define how annotations can do that.

@jdimeo But you example with rtika shows why it doesn't work, because it does a lot steps before calling the actual Java code or am I missing something? So it is an argument that the middleware idea has merit and you still have to write proper R package regardless.

Either way, I'll leave you guys to it as it seems you have covered what you have in mind. If you need any extra support from rJava, let me know.

robchallen commented 3 years ago

@s-u Apologies for what maybe a stupid question. I'm just looking at how I can release resources in java that have been assigned through .jnew calls in rJava and struggling with some of the Java <-> JNI <-> rJava concepts, and I can't find this specifically mentioned in the rJava documentation.

How would you recommend using rJava to release a Java object, held as a jobjRef and allow it to be garbage collected?

s-u commented 3 years ago

@robchallen All memory management is automatic, the references are mirrored between the languages, i.e., an object on one side is released when its reference is released on the other side.

So for example, if you create a new object instance in Java via .jnew() then that object is released as soon as the reference is garbage-collected in R. That's why is it a good idea to explicitly trigger GC in R if you want to release all unused objects in Java and vice-versa.

robchallen commented 1 month ago

Hi. Had a chance to work on this a bit over time alongside other work. The maven plugin and a java runtime library is now more or less feature complete and available on maven central. There is also a maven archetype to help set up java projects which generate R packages.

I moved its github repository to here: https://github.com/terminological/r6-generator

As before the plugin comes with an example project that serves as the documentation. The main landing page for that is here: https://terminological.github.io/r6-generator-docs/docs/

as of version 1.1.0 most things work :-) but it's still best regarded as beta.

jdimeo commented 1 month ago

@robchallen THANK YOU. I actually just had a use case emerge for this, 3 years later. I just built a complex interaction with Stata and Java and now I want to provide an R equivalent plugin and this would really help get the rJava side going. I'll be checking it out soon.

robchallen commented 1 month ago

Sounds good. Be good to get a fresh pair of eyes on it.

Happy to have a zoom call when you are at the right stage.

On Thu, 16 May 2024, 17:08 John Dimeo, @.***> wrote:

@robchallen https://github.com/robchallen THANK YOU. I actually just had a use case emerge for this, 3 years later. I just built a complex interaction with Stata and Java and now I want to provide an R equivalent plugin and this would really help get the rJava side going. I'll be checking it out soon.

— Reply to this email directly, view it on GitHub https://github.com/s-u/rJava/issues/74#issuecomment-2115645377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6SWIBOPDXR2GUW2BCZDCLZCTKZBAVCNFSM4CB4QAIKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJRGU3DINJTG43Q . You are receiving this because you were mentioned.Message ID: @.***>