Open JimClarke5 opened 3 years ago
BTW, the only reason we're doing this in Bazel in the first place is because the original op generator was written in C++. We can and should rewrite it in Java, and then when that's done AFAIK we won't need to do anything with Bazel or C++ anymore. In fact, we can also call CPython functions very easily from Java using JavaCPP without having to deal with anything C++: https://github.com/bytedeco/javacpp-presets/tree/master/cpython#the-simplejava-source-file
@saudet I agree but rewriting the op generator in Java won't be an easy task though, if Jim have something close to be working in C++ we should give it a try.
The thing is that even if we can reformat it right, the code examples in the doc will still be Python. I know there is 1000+ ops available in TF but the right way for completely transcripting the markdown to Javadoc would be to rewrite it in our API definitions (under src/bazel/api_def
, that is what they are meant for).
What do you think @JimClarke5 that we run your script just once to generate the doc in our API def files and then we slowly but surely fix the issues as we find them manually? And we could run that script to generate the doc of the new ops everytime we upgrade the TF runtime version
@karllessard There were some issues when I originally was trying write it back out to the src/bazel/api_def, the main issue was parsing the api_def file. op_generator
does parse the api_def file with C++ code. I will look at doing a 100% java program using antlr. I will need to parse the api_def file, then parse the markdown.
I'm pretty sure it just uses protobuf to parse all this, which we can use easily enough from Java, or am I missing something?
It is protobuf like, but not 100% the same. I found grammars-v4/protobuf3/Protobuf3.g4
on the antlr4
site, but there is no op
or op_name
defined in that grammar.
@karllessard @Craigacp
I have been experimenting with converting the TF Markdown text to JavaDoc format in the
op_generator
code. I did this by creating another c++ class, that calls out to Python using the Python C library. This runs thePython marko
package with my own marko renderer classjavadoc_renderer.JavaDocRenderer
that converts markdown to JavaDoc. In the C++ class,SourceWriter
, I call out to the python code to convert the Markdown text to JavaDoc. The converted JavaDoc code is then written out to the class.Here is an example of the old and new generated JavaDoc for
org.tensorflow.op.math.Abs
:Current JavaDoc:
New JavaDoc:
There still needs some tweaks to JavaDoc output, like <p> on a single line. Also, I am still chasing down an infrequent error where the conversion string gets garbled.
I have made several design decision that should probably be discussed. For example, I put my Python module in
bazel-bin
and point thePYTHONPATH
to it inbuild.sh
.Also, I cannot figure out how to bring in the python library from the framework into the BUILD file. For now, I have it hard coded.
Any help on setting the bazel rules for include the python library would be appreciated.
I did find
@org_tensorflow//third_party/python_runtime:headers
, which I added as a dependency in thecc_library
section ofBUILD
. This allowed me to compile the c++ code with thePython.h
header.I can create a draft PR if you want to look at the whole project, so we can iterate on some of the design decisions, and figure out how to link with the Python C library in a bazel friendly way.