[WIP] Add core-api modules using JavaCPP

saudet commented 5 years ago

See discussion at https://github.com/karllessard/tensorflow-java/issues/1#issuecomment-528385997

Craigacp commented 5 years ago

Why is the generated code checked in? Isn't this rebuilt on demand by JavaCPP when the build is executed?

saudet commented 5 years ago

Why is the generated code checked in? Isn't this rebuilt on demand by JavaCPP when the build is executed?

To have it available for search engines and for reference. It's rarely possible to transfer mechanically all comments from C/C++ header files to 100% correct comments for Javadoc. Ideally what I would like is something like https://www.javadoc.io/ that works well for the sources artifacts too, but I haven't found any such services yet. Do you know of any? I was thinking we could also add the generated source code for the ops too instead of having a JAR file checked in. Or we could decide to not check in any generated code at all, but I always find myself looking at generated code for reference, so I'm sure many other users do the same. What you think @karllessard ?

saudet commented 5 years ago

It looks like this is targeting Tensorflow 2 rather than the 1.x releases we've been working on. Should we try building it on 1.15 so the Java API on top doesn't have to change too much. Once it's migrated to JavaCPP then we could move the whole API over to something that behaves more like 2.0.

I thought the plan was to start with 2.0 and leave the old APIs upstream with 1.15. @karllessard?

karllessard commented 5 years ago

Or we could decide to not check in any generated code at all, but I always find myself looking at generated code for reference, so I'm sure many other users do the same. What you think @karllessard ?

Yeah, it's never clear what to do with the generated code, both options have their pros and cons. I think it depends on how often we need to generate this code: if we only want to generate it once in a while, for example when we upgrade our version of the TF libraries, then I guess it's ok to check it in. I think it's the case for the JavaCPP bindings and the operation wrappers.

If on the other hand, if we'd benefit to generate the classes at each compilation, then might be better not. Like the Ops class, which collects all @Operator classes in the source path, including eventually user custom ones.

My 2 cents.

I thought the plan was to start with 2.0 and leave the old APIs upstream with 1.15. @karllessard?

That's what I had in mind, yes, but @tzolov also think it might bring confusions with expected feature that should come with a "2.0" TF release. I guess another "Google Forms" vote is required for this :)

Craigacp commented 5 years ago

@saudet WRT generated code I usually look at the decompiled thing that my IDE generates out of a class file if I need to inspect something. Sometimes I'll look at the source itself but less frequently.

I think it's important to write down somewhere all the validation that the current JNI code is doing, so we know what we need to add back into the Java code that calls into the generated wrapper code. We can do this after the fact through inspection, but do we think there are enough unit tests to catch all the code we're removing here?

saudet commented 5 years ago

Yeah, it's never clear what to do with the generated code, both options have their pros and cons. I think it depends on how often we need to generate this code: if we only want to generate it once in a while, for example when we upgrade our version of the TF libraries, then I guess it's ok to check it in. I think it's the case for the JavaCPP bindings and the operation wrappers.

If on the other hand, if we'd benefit to generate the classes at each compilation, then might be better not. Like the Ops class, which collects all @Operator classes in the source path, including eventually user custom ones.

It's kind of both. It doesn't change often, but it gets regenerated on each build to make sure it doesn't change, or to make sure we commit the changes :) I'm using src/gen for that with JavaCPP, and we could copy all the code generated on build with Bazel there too. Sounds good?

I thought the plan was to start with 2.0 and leave the old APIs upstream with 1.15. @karllessard?

That's what I had in mind, yes, but @tzolov also think it might bring confusions with expected feature that should come with a "2.0" TF release. I guess another "Google Forms" vote is required for this :)

:+1:

@saudet WRT generated code I usually look at the decompiled thing that my IDE generates out of a class file if I need to inspect something. Sometimes I'll look at the source itself but less frequently.

That doesn't give us the comments though. That's why Maven typically attaches by default a sources artifact to each main artifact, and it's something IDEs can download automatically for us, but there doesn't appear to be a way to reference it easily outside of an IDE...

I think it's important to write down somewhere all the validation that the current JNI code is doing, so we know what we need to add back into the Java code that calls into the generated wrapper code. We can do this after the fact through inspection, but do we think there are enough unit tests to catch all the code we're removing here?

Hum, I'm assuming there are unit tests for this already @sjamesr? If we port everything to Java including the unit tests, that should be enough I thought. In any case, this is going to happen in a second phase, so we can think about that some more a bit later.

karllessard commented 5 years ago

I'm using src/gen for that with JavaCPP, and we could copy all the code generated on build with Bazel there too. Sounds good?

@saudet yeah, let’s give it a try like that and see how it looks like

In any case, this is going to happen in a second phase, so we can think about that some more a bit later.

Just to remember, the first phase is just to build and deploy the libraries with JavaCPP while keeping the original JNI code active? Will we use a few classes generated by JavaCPP as well?

saudet commented 5 years ago

Just to remember, the first phase is just to build and deploy the libraries with JavaCPP while keeping the original JNI code active? Will we use a few classes generated by JavaCPP as well?

Yes, that's how I see it so we can try stuff and migrate gradually if it looks like it's working well.

So, to summarize the changes to be done:

Remove redundant classes and dependencies already moved to other modules, and
We could also remove build.sh for now until it becomes something users ask, and
Depend on Bazel to build a library exposing the C API of TensorFlow + to generate Java code, but
Use JavaCPP to build the JNI library from generated code and manually written legacy code, and
Put all generated Java code either in src/gen to allow users to reference it more easily or target/generated-sources if we don't want to check it in the source code repository.
- BTW, how can we do this with Bazel without Bash? Even Google gave up and just tells users to run a Bash script for the Python build: https://www.tensorflow.org/install/source Should we keep something like build.sh around for Java as well?

karllessard commented 5 years ago

@saudet I agree with your plan. BTW, can you please now fork the official repository (https://github.com/tensorflow/java) and rebase your PR on this one? I just pushed the JNI-based core in there so we have at least something to show and that compiles.

I would like to move all our current discussions to the new repository as well but unfortunately, GitHub does not allow cross-organization issue transfers.

karllessard commented 5 years ago

BTW, how can we do this with Bazel without Bash?

Oh, forgot to reply to this one. Can't we use Maven? Like with the copy resource plugin?

saudet commented 5 years ago

@saudet I agree with your plan. BTW, can you please now fork the official repository (https://github.com/tensorflow/java) and rebase your PR on this one? I just pushed the JNI-based core in there so we have at least something to show and that compiles.

I would like to move all our current discussions to the new repository as well but unfortunately, GitHub does not allow cross-organization issue transfers.

Do I need to create a fork? I mean, would I need to fill some paperwork to become a collaborator there?

Oh, forgot to reply to this one. Can't we use Maven? Like with the copy resource plugin?

We can go to great lengths to not call Bash if that's what you mean, yes. Is the idea to avoid Bash like the plague? :) Bazel itself needs Bash, so it's not like we'd gain anything here. Anyway, tell you what, I'll let you modify the Bazel build files and the pom.xml file in order to get rid of the script that I'll leave there "temporarily". Sounds good?

BTW, what about the names of the modules? Is everyone OK with core-api and core-api-platform or should I call them something else?

karllessard commented 4 years ago

Do I need to create a fork? I mean, would I need to fill some paperwork to become a collaborator there?

Yup, I discussed with @tzolov and @sjamesr and instead of managing a list of collaborators with their appropriate access rights, we'll all merge our changes by PRs from our personal accounts, which is pretty common with open-source projects hosted on GitHub. I don't foresee any problems that might prevent you to accomplish your work here, do you?

I'll let you modify the Bazel build files and the pom.xml file in order to get rid of the script that I'll leave there "temporarily". Sounds good?

I sincerely don't mind, I proposed Maven to Bash because 1) that's what Java developers like to work with and 2) it is interesting to attach a given job to a specific phase (e.g. generate-sources for building the op wrappers). But yes, if I find time, I can do those changes "later" and let you review it.

Is everyone OK with core-api and core-api-platform or should I call them something else?

I'm OK with those names.

karllessard commented 4 years ago

Merged in the official repo

karllessard / tensorflow-java-sandbox

[WIP] Add core-api modules using JavaCPP #2