Feature idea: Extraction of docstrings from javadoc

petrushy commented 4 years ago

Hi,

This is likely a far in the future enhancement, but just to write it down.

It would be interesting to have possibility of docstring generation from javadoc. So that for automatic popup info the documentation string is available, with more details as of now.

One needs then of course to have access to the source code. And maybe it could be parsed to some database.

One tool that may be useful is qdox, a java tool that can parse source for javadoc. https://github.com/paul-hammant/qdox

petrushy commented 4 years ago

I was trying some things and it is not possible to monkeypatch the doc property of JObject in the same way as the repr, is it?

Thrameos commented 4 years ago

The answer is no and yes. Doc strings are supposed to be fixed immutable strings so you can't patch them directly. But if you look over _jclass you will find the redirect that converts them into properties and redirects them into the method _jclassDoc. You can apply the same procedure to redirect the doc routine to whatever function you need.

Also notable is that if you compiled with -g:source you can get the source location in both _jclassDoc and _jmethodGetDoc which can extract the java doc in the source or let you extract the java doc from the html doc package. I recommend installing your own handler rather than changing the ones in the code as private names may chance.

Thrameos commented 4 years ago

I think two possible solutions here.

First we can look for the javadoc jar resources. It will return the same rather old and crusty html page that javadoc page. If you look know the name mangling you can jump down into the method or class section. Then you would just have to html to rst the blob of html. Obviously we can't get every little detail of html right, but it would get a lot of documentation included.

Second if we can't get the preformed we would call for the source class. Parsing Java is much harder especially if the line number for the method were not compiled in.

In both cases the user just has to add the source or javadoc jar to the classpath. We then use Class.getResource() to fetch the section needed. If it isn't found we just fall back to the usual autodoc.

I took a shot at the parsing, but concluded that it would be at least 2 nights of work to get the javadoc out which is unfortunately a lower priority that work for the 0.8 release.

petrushy commented 4 years ago

Thanks for the update and the intense work with jpype!

I did some tests with qdox and attaching it at the variables above. Qdox is parsing directly the source tree to find javadoc (and other parts). However, not sure that is the right way, some javadoc are using references and tags (like inheritDoc), which then is not processed, so the look is not optimal, but nice to be able to plug in things like this in the library. I think your first option is likley the best, using the html extract.

Thrameos commented 4 years ago

@petrushy Progress update. I succeeded in integrating an HTML parser that can extract each of the html sections from the javadoc files and a Zip file system that allows the user to open the base Java API documentation. There are two parts remaining to this task.

[x] Convert the html to rst. (I tried a few of packages that are supposed to perform this task but the javadocs are have a style sheet that make it hard to convert with anything generic, so we are going to need to make a custom one.) This one is not so hard. Just simple pattern matches should be able to do a lot of the task. There will be edge case like subscripts and other weird html, but we should be able to get a 90% solution pretty quickly.
[x] Integrate the resulting doc into the class and method files. I am not sure how to present fields and inner classes.

Estimated remaining time on this task less than a week. I should be able to get it into the JPype 0.8 release assuming no major hangups.

Thrameos commented 4 years ago

@petrushy Progress update. I have now successfully rendered the entire jdk 8 java doc into rst. It isn't perfect but it is a start. I have one remaining task to link it up to methods and classes. Once that is complete it should be ready to test. Speed is not so good as my parser is pretty crud.

You may want to contribute by improving the renderer as it could use some additional work. Sometimes the combination of html elements generates invalid rst (like "``````"). References and linkage to external documents don't always work. Tables are not rendered at all.

There are three major support classes.

JavadocExtractor - pulls all the sections out of html document
JavadocTransformer - converts the dom sections into a markup usable by renderer with custom tags. This may be possible to replace with a good xslt, but I am not too good with that tool.
JavadocRenderer - Converts the marked up sections into restructured text.

Thrameos commented 4 years ago

@petrushy The requested enhancement is complete. Please test, add a review, and comment so it can be included in JPype 0.8.

petrushy commented 4 years ago

Hi @Thrameos! Many thanks, will start testing.

petrushy commented 4 years ago

WIP: Hi did some intial tests, will spend more time later. Some things seems to be extracted, but others don't (has a javadoc) property still there. I assume it shold be UTF8 encoding of the javadoc, there are quite some settings in the project I'm wrapping..

and in pom.xml

maven-javadoc-plugin

    <version>${orekit.maven-javadoc-plugin.version}</version>
    <configuration>
      <overview>${basedir}/src/main/java/org/orekit/overview.html</overview>
      <additionalOptions>
        <option>--allow-script-in-comments</option>
        <option>-header</option>
        <option>&apos;${orekit.mathjax.config} ${orekit.mathjax.enable}&apos;</option>
        <option>-extdirs</option>
        <option>${tools.jar.dir}</option>
      </additionalOptions>
      <bottom><![CDATA[Copyright &copy; ${project.inceptionYear}-{currentYear} <a href="http://www.c-s.fr">CS Group</a>. All rights reserved.]]></bottom>
      <links>
        <link>https://docs.oracle.com/javase/8/docs/api/</link>
        <link>https://www.hipparchus.org/apidocs/</link>
      </links>
      <source>${orekit.compiler.source}</source>
      <doclint>none</doclint>
    </configuration>

Will investigate and try to generate a cleaner javadoc. But seems like some classes that are not detected are rather plain. WIP.

Thrameos commented 4 years ago

Is there a Javadoc jar for the package that I can try pulling docs from? I currently have it set to ignore docs that it is having problems with so that could be causing it to skip.

So stuff that is missing….

Tables (no renderer)
Properties (no place to put them currently)
Math and any fancy markup. (no renderer)
Anything with html errors that I haven’t already handled.

I haven’t deal with encoding so there may be issues there.

petrushy commented 4 years ago

Yes, thanks, it's the orekit library I'm working with, artifacts at: https://repo1.maven.org/maven2/org/orekit/orekit/10.1/

For example org.orekit.time.AbsoluteDate is one that does not seem to work. https://www.orekit.org/static/apidocs/org/orekit/time/AbsoluteDate.html

While org.orekit.time.TimeScalesFactory works https://www.orekit.org/static/apidocs/org/orekit/time/TimeScalesFactory.html

Thrameos commented 4 years ago

Okay I will investigate this evening. (I may need to add a diagnostics mode that one can trigger to get a translation and rendering report.)

Thrameos commented 4 years ago

I looks rendered just fine for me. Can you be more specific about what issue you are seeing?

Here is what I see and the script that generated it.

doc.txt testDoc3.txt

petrushy commented 4 years ago

Wierd. I simplified your script a bit, tried it in python 3.6 & 3.7 (conda versions), but get:

Description

Failed to extract javadoc for class org.orekit.time.AbsoluteDate Java class 'org.orekit.time.AbsoluteDate'

Extends:
    java.lang.Object

Interfaces:
    org.orekit.time.TimeStamped, org.orekit.time.TimeShiftable,
    java.lang.Comparable, java.io.Serializable

...

I have all orekit and hipparchus jar's (not the javadoc for hipparcus) and orekit javadoc in same dir as script:

import jpype from jpype.types import import jpype.imports jpype.startJVM(classpath=['./']) import org

p = org.orekit.time.AbsoluteDate

print("Description") print("-----------") print(p.doc)

Tested with a new environment also in conda.

I am using openjdk 8 from conda, cannot test with a newer at the moment.

petrushy commented 4 years ago

source is from the Thrameos/javadoc branch

Thrameos commented 4 years ago

Okay I can confirm this one. It appears to work on Linux with all versions of Python and JDK 8-11 but fail on Python-3.5 with JDK 11. I will investigate.

petrushy commented 4 years ago

I'm on windows currently, have tried with same results on Python 3.6 & 3.7. Can test later on mac / linux.

Thrameos commented 4 years ago

Okay I corrected a few issues that I located in that example. You can use

jde = JClass("org.jpype.javadoc.JavadocExtractor")
jde.failures = True

to get the source of the problem. Some of the hyperlinks appear busted (in different ways on linux and windows) but these are mostly just rendering issues that we can track down later. Overall I think this can be included with some followup to address rendering issues.

You may want to do a full doc extraction run to see what other problems need to be addressed. For now I have to move on to 0.8 bug hunt so I can finally finish the release.

petrushy commented 4 years ago

Ok, will experiment with it.

Yes, it is still very usable and looking forward for 0.8 release! Thank you for your efforts in this development!

petrushy commented 4 years ago

WIP: Removed comment of not working under linux as it somehow is working now. Could be user error.

Tried with different versions of openjdk under windows (8, 11) and the example above do not work in any of them.

Thrameos commented 4 years ago

So any conclusion on how well it is working?

petrushy commented 4 years ago

Now it is working in windows as well, for some practical tests really well, now using JDK 8. Many thanks for implementing this, especially useful for end-users of "wrapped" java libraries.

Some minor personal preference are user-settable linelength and possibility to filter away the meta tags, like the : class/meth : .' ' (I would prefer it just removed) That looks really nice in a tool that supports rst rendering of javadocs like spyder, but looks a bit noisy in some other environments like jupyterlab, which is a common one. I may have a try at this later, could be user settable.

Many many thanks for implementing this, and the overall improvement of jpype, lots of work.

BR

Thrameos commented 4 years ago

Hmm. Okay I suppose that we can find a way to check what the environment is and select the appropriate render properties. The rendering properties are not that hard to control though I am hesitant to make them public symbols as they are.

Perhaps we should just make them pull the values from System.getProperty. Then you would be able to just call the property with the desired value and leave the implementation free to change if needed in the future rather than having people poke at private symbols.

Say something like

org.jpype.javadoc.TextWidth - set the column width for wrapping paragraphs. (Default "120")
org.jpype.javadoc.EnableDomains - use :class: and :meth: when linking. (Default "True")
org.jpype.javadoc.EnableExternal - add links to external document (Default "True")

Do you have any additional properties you would like to see controllable such as sections to include or exclude? If you have preferences I will see if I can squeeze it in prior to the release candidate.

petrushy commented 4 years ago

Hi, yes sounds good - I don't think it is necessary to be widely exposed, this is likely more for people who are tuning python wrappers of java libraries. I don't have any additional, one needs to find the quirky cases I guess to see what more may be needed to tune, but this can be done in future versions.

Thanks!

Thrameos commented 4 years ago

I looked into it further. The module doing the rendering for help is pydoc. Its support for sphinx domains and such is really underwhelming (read non-existent). I am a bit shocked that the integration between these isn't tighter.

Given that, it seems like I should just have a master style switch for sphinx or pydoc rendering as org.jpype.javadoc.Style so that the user doesn't have a bunch of settings to play with.

petrushy commented 4 years ago

Yep, saw some request of that for Jupyter but seems not to be near. A master switch would work well.

jpype-project / jpype

Feature idea: Extraction of docstrings from javadoc #702

Description