Consider using scyjava?

ctrueden commented 2 years ago

Hi @tlambert03! :wave:

I was wondering if you have considered using the scyjava project to resolve Java dependencies, rather than shipping them in a hardcoded way like this project does?

For PyImageJ, we use scyjava (backed by jpype and jgo) and it lets you customize your classpath from remote Maven artifacts in an extensible way. Without something like this, you'll always run into problems with multiple different needed JARs on the Java classpath across multiple Python-wrappings-of-Java-libraries. Whereas if you use scyjava, we can mix and match things like AICSImageIO[bioformats] with PyImageJ and potentially other Java-backed Python components.

Do you see any reason this wouldn't be a better approach? Happy to discuss further if you have questions or comments about it!

CC @hinerm @elevans

tlambert03 commented 2 years ago

Do you see any reason this wouldn't be a better approach? Happy to discuss further if you have questions or comments about it!

nope! I would love to learn more about this. let me look into your pyimagej distribution setup a bit and I'll ping you if I have questions. Sorry for my general lack of java world awareness! 😅

ctrueden commented 2 years ago

Here's some quick working code implementing your get_loci function in a scyjava-compatible way:

import jpype
import scyjava

scyjava.config.endpoints.append('ome:formats-gpl:6.7.0')
FormatTools = scyjava.jimport('loci.formats.FormatTools')
print(f'Bio-Formats version = {FormatTools.VERSION}')

def get_loci():
    loci = jpype.JPackage("loci")
    loci.common.DebugTools.setRootLevel("ERROR")
    loci.__version__ = loci.formats.FormatTools.VERSION
    return loci

loci = get_loci()
print(loci.__version__)

It doesn't cover all the bells and whistles you added with the environment variables, but I'm confident we can work those in to meet requirements.

The key thing here is that the above does not need any local JAR files to be there, but rather uses jgo to download them on demand from the remote Maven repository. And if other projects want to append their own endpoints (i.e. JARs) as well, they can go ahead and do that, and it is likely to work in the single JVM that JPype starts. The JVM is also started on demand. Of course there are still edge cases, but it's a way to make classpath harmony more feasible.

tlambert03 commented 2 years ago

Very cool. I really appreciate your input here.

One thing I'd love to keep is the type stubs (and happy to figure out how to add to scyjava). Something to chat

ctrueden commented 2 years ago

I'm not sure how you would add stubgenj to scyjava in an extensible way. Possible, probably, but unfortunate that stubgenj can't dynamically generate these, rather than writing python source code? Or can it?

Type hinting feels like something that JPype should "just do", rather than going through this rigamarole. I say naively without looking at it for more than a minute. :wink:

ctrueden commented 2 years ago

Maybe this is somehow interesting? https://github.com/jpype-project/jpype/issues/297

And from the JPype user guide:

JPype is Jedi aware and attempts to provide whatever type information that is available to Jedi to help with completion tasks.

What are your use cases for having these stubs? What benefit do they provide over what you get from Jedi and JPype out of the box? I am completely clueless and ignorant of Python 3 typing / hinting / etc., so I appreciate any education you can offer here!

tlambert03 commented 2 years ago

before answering, I'll just say that I see type information as very desirable, but its absence is not a dealbreaker. if the "right" way to do this is to just use scyjava, and if that comes with the sacrifice of type hints... I'm still very open to it.

Possible, probably, but unfortunate that stubgenj can't dynamically generate these, rather than writing python source code? Or can it?

stubgenj generates the pyi files you see here. If by "dynamically" you mean at python runtime, then perhaps that's the main point to discuss here, because the primary use of these type hints is in static code analysis (not in tab autocompletion or docstring printing in during runtime, in ipython or jupyter for example).

What are your use cases for having these stubs?

The primary use case is for me is 1) tab autocompletion (and typo prevention) in the IDE, and type-aware linting with tools like mypy or other static type checkers. These things just accelerate development so much that I've come to feel lost without them.

for instance, with the stubs provided in bioformats_jar:

without them... one has to continually refer back to online documentation:

What benefit do they provide over what you get from Jedi and JPype out of the box?

I'm really not an expert here, so can only share my experience: I've never been able to figure out what the Jedi support means for Jpype. While I can get their example here to "work" in the sense that __annotations__ returns what they say it should, I can never actually get my IDE to give me autocompletions, or mypy to have any idea about the attributes of loci.formats, etc.

Also, at the very least, this requires one change the language server settings in vscode away from the default pylance over to jedi or something. Stub files on the other hand are the de-facto way to declare namespaces and type hints in python, particularly for things that are hidden behind a C (or, in this case, Java) extension. They will work in any environment, jedi or not. Yes it's a shame that those stubs need to be actually generated and put into source somewhere, but it only takes 10 seconds or so with stubgenj, and the benefit is big.

I'm not sure how you would add stubgenj to scyjava in an extensible way.

Ultimately, you could do this in a progressive way... adding stubs for certain packages that somebody "cared enough" to add. Or, people could create their own stub-only-packages to extend typing for scyjava.

I notice that scyjava jimport doesn't seem to wrap "jpype.JPackage" ... only "jpype.JClass"... but if it did, you could eventually have something like this:

import jpype
from typing import TYPE_CHECKING, overload, Literal

if TYPE_CHECKING:
    import _stubs

@overload
def jpackage(package_name: Literal['loci']) -> '_stubs.loci': ...
@overload
def jpackage(package_name: Literal['ome']) -> '_stubs.ome': ...

def jpackage(package_name: str) -> jpype.JPackage:
    ...

and only those packages that have stubs would have @overloads added... With that, someone would then get all of the in-IDE autocompletion and linting support that are currently provided here.

In any case, I'll still look into deprecating this package and using scyjava. but that would be my "dream" solution ultimately for using the scyjava ecosytem while developing in python

ctrueden commented 2 years ago

@Thrameos Do you have any thoughts and/or insight here on what a good path forward would be for us to achieve good autocompletion in IDEs for Python-wrapped Java classes? Is this something we "should" already be getting with JPype out of the box but are just doing wrong? Or is this an area of potential improvement for JPype? Is it something we could help to advance somehow, and if so, how?

Thrameos commented 2 years ago

As far as I know JPype works with autocomplete (using jedi) but it is limited to the first level. Chaining of returns obj.getField().getNext().callMethod() will not autocomplete because Python does not know the return type after the first getField(). The work around is to assign each to a variable and then autocomplete from there. There is a 3rd party library mentioned in the issues list that attempts to generate stubs though I haven't used it myself.

The issue is covariant returns, Java method may return different types based on the incoming arguments. Each dispatch holds all the methods for the different overloads. There are methods in JPype which allow you to get all the return types (used for documentation generation). We are properly filling in the return field where the dispatch is unambiguous, which should make autocompletion happy. But I haven't looked into the issue beyond that.

tlambert03 commented 2 years ago

thanks for your input @Thrameos!

There is a 3rd party library mentioned in the issues list that attempts to generate stubs though I haven't used it myself.

I believe that's stubgenj correct? (it's used here in this package, which precipitated this conversation). @ctrueden, you can read more about the motivation for stubgenj in https://github.com/jpype-project/jpype/issues/714 also pinging @michi42, the author of stubgenj ... who may be able to shed additional clarity on the limitations/advantages of jedi support vs type stubs.

The issue is covariant returns, Java method may return different types based on the incoming arguments.

when you say "the issue is"... do you mean the issue with jpype and jedi? or the issue with types stubs? this is handled with type stubs using typing.overload, correct?

Thrameos commented 2 years ago

By issue I mean the general difficulty with making Java return specification in Python. Python does not support overloading. Thus you can have only one specification for each method. As Java may have a different return type for each different input arguments, you end up with either very generic or specifications that are overly specific.

tlambert03 commented 2 years ago

Python does not support overloading.

apologies if I'm still missing your point, but while it's true that python doesn't have true static overloading, python typing lets you express this nicely with the @overload decorator. Perhaps there's something in the specifics of Java reflection that I'm not aware of that make this specific case difficult. but for the generic case of different return types based on different inputs, python type stubs are ready for that case

Thrameos commented 2 years ago

I looked at the @typing.overload and I doubt that it would work for JPype. The PEP that describes the overloading feature doesn't really describe anything that I can duck type or a mechanism that I can implement. To provide something for a reflected method you have to be able to have some field (like __annotations__) that an object can provide that will return back a description that I can fill out.

The method descripted in the PEP is for source reader type overloading. Those are for Jedi or other completion engines to read the source code in Python and then produced the type hints it needs. Unfortunately the Java dispatches don't have Python source code sitting behind them. You could create a stub file which would produce the desired output like stubgen. I am not sure there is anything more behind the mechanisms than that, but I may be mistaken.

The solution that JPype is using is to provide methods in the form of __annotations__ which is something that I can provide using type reflection. But that mechanism is restricted to returning only one return type and one set of argument lists. Hence not very useful for general overloading.

I am guessing that the only way to get full stubs automatically would be to fool the type system by filling out the __file__ field on a Java dispatch and create a file that contains the required stubs. Thus when jedi looks at the object if can be directed to the location where something will write the stubs. I use this sort of code generation mechanism for providing the Python documentation, but there it is driven through the __doc__ annotation which again is something that I can generate on the fly. Then a tool like Jedi would be able to get the completion patterns. Unfortunately unless those files can be virtual and generated only upon request, the creation of them would certainly be a huge bottle neck for general usage.

Thus my original statement "Python does not support overloading". It does allow a file to be annotated with overloaded signature so that a code completion code can get that info, but there is no mechanism behind it that one can simply replicate to provide the same functionality. Does that make sense?

tlambert03 commented 2 years ago

thanks again for taking the time @Thrameos! very helpful.

I believe it's solidifying my understanding that pyi stubs are the most direct (only?) only way to do what I'm after here for the purposes of static type checkers in the IDE @ctrueden. And something that would be very hard for JPype as a library on the whole to "just do" for all downstream packages that use it (like scyjava). I continue to think that the burden should be on users of JPype, not on JPype itself.

It kinda sounds like this all comes down to runtime annotations (via reflection and populating __annotations__) vs static annotations (via pre-generated type stubs ala stubgenj or something). Both useful, no doubt, with perhaps slightly different target use cases?

I am guessing that the only way to get full stubs automatically would be to fool the type system by filling out the file field on a Java dispatch and create a file that contains the required stubs.

the way that stubgenj goes about it (and the way I'm doing it here) is to decorate the JPackage/JClass style methods with an overload that returns a module stub:

@typing.overload
def JPackage(__package_name: typing.Literal['loci']) -> loci.__module_protocol__: ...
def JPackage(__package_name) -> types.ModuleType: ...

That works just fine for my use case, and provides all the stubs as shown above in https://github.com/tlambert03/bioformats_jar/issues/2#issuecomment-932776940 when I use JPackage('loci')... but obviously required specific knowledge of the package I was going to load with JPype

Unfortunately unless those files can be virtual and generated only upon request, the creation of them would certainly be a huge bottle neck for general usage.

I agree, dynamically generating these files on request would be prohibitively slow. and since JPype can't possible store pre-generated stubs for all of it's dependents, it makes sense for those libraries that really want them to generate them.

do you agree we're all hovering around the same understanding?

Thrameos commented 2 years ago

Yes I believe you are correct in your understanding.

I want to elaborate on the dynamic method a bit more. The speed issue is that we don't want to generate the stubs just because a Jar is loaded which would add a huge overhead. Instead the ideal goal would be to produce the stub when requested, which is just a matter of calling the Java reflection on each method in each dispatch. The downside here being you can't just do one dispatch, but you must do every dispatch that is available in the class and every class in the hierarchy that is above that.

As for dynamically generating stubs this is mostly a technical issue that I don't know when to generate the file nor how to make the file appear or preferably a virtual file to appear. The inspect module is used to get the file to be associated with a function or class. But some like jedi go out of their way not to call any code in the library that is targeted. That indirect methods makes it very challenging to catch the attempt to access the contents and redirect it to the stub generator. In other words the system is built to prevent execution of code such as we would need to dynamically create a stub. This also means that the stubs must be a physical file on disk and not some structure in memory (at least as far as I can tell).

We have worked around that on __doc__ by making it so that when you call for the property (which is a C++ implementation so the code checkers can't see to bypass it) that it gets routed to a hook where the documentation gets produced. If we knew the same type of trick to slide a call procedure in we could call the stub generator. As far as JPype it just needs to have the hook to direct the creation of the stub. The actual implementation of a stub generator can be default (like JPype doc) or it can point to a user implemented stub generator.

Hope this helps if you are interested in trying that approach.

tlambert03 commented 2 years ago

thanks again. and, while you're here, thanks for JPype! it's made my life as a python dev who knows very little about java a lot easier 😄

michi42 commented 2 years ago

also pinging @michi42, the author of stubgenj ... who may be able to shed additional clarity on the limitations/advantages of jedi support vs type stubs.

The main difference (which can be seen both as an advantage or as a drawback, depending on the situation) is that type stubs are fully static and pre-generated, they allow type checking without actually executing any code or doing any runtime introspection at use time (stubgenj uses introspection and Java Reflection to generate the stubs, but that's a different story).

Static pre-generated stubs allow auto-completion in IDEs which are based on static analysis rather than introspection (e.g. PyCharm) as well as the use of static type checkers (e.g. mypy). Chaining of method calls is not an issue as all information on arguments and return types is available statically, without actually running the calls. Also, the generated type stubs allow full usage of the python typing system (e.g. the mentioned @typing.overload) which to my knowledge is currently not fully possible at runtime by setting __annotations__.

However, the limit is obviously that the stubs need to be pre-generated. Downloading arbitrary JARs at runtime is not really compatible with pre-generating type stubs, and certain type of JPype customizers which rewrite python-java calls at run time will hardly ever be supported (the default ones are, as they expose some information on what they are doing). Also, the stubs rely on the usage of the JPype import system to establish the link between Java classes/packages and stubs. Instantiating Java objects through Strings won't work, as there is no way for a static code analyzer to know what a call like jimport("java.util.List") is actually doing - unless you manually annotate the return type.

This however does not mean that you need to ship the JARs along with your code to be able to use stubgenj - e.g. at CERN we don't, instead we have a dependency management system (cmmnbuild_dep_manager) that downloads the required JARs on first execution. As long as the JARs are available via the JPype import system, you should be able to run stubgenj on them - you just need to write your own little piece of code that fetches the JARs and fires up a JVM with them on the classpath. Then call into stubgenj. See https://gitlab.cern.ch/scripting-tools/cmmnbuild-dep-manager/-/blob/master/cmmnbuild_dep_manager/_manager.py#L976 for how we do this in our cmmnbuild_dep_manager. From there, you have two options - either you pre-generate the stubs and ship them with your package (or as a separate stub-only distribution). Or you ship the script to generate the stubs, and add stubgenj as an optional dependency, so users interested in the stubs can easily have them generated themselves.

tlambert03 commented 2 years ago

thank you for also weighing in @michi42! (and thank you for stubgenj!)

there is no way for a static code analyzer to know what a call like jimport("java.util.List") is actually doing - unless you manually annotate the return type.

yeah, I realized this.. though ultimately went with a slightly different strategy than the jpype-stubs folder that stubgenj creates, because I was confused about one thing: can multiple packages contribute stubs to the same environment using stubgenj? Or does it assume there's only one user in the environment?

For instance, if I run stubgenj for the jar I'm using, I get:

# jpype-stubs/__init__.py
import types
import typing
import loci

@typing.overload
def JPackage(__package_name: typing.Literal['loci']) -> loci.__module_protocol__: ...

def JPackage(__package_name) -> types.ModuleType: ...

it seems that anyone else wanting to provide stubs for a jpype jar would need to "collaborate" on this one file, is that correct? That's why I ultimately moved the stubs into my own package, and provided my own get_loci() function. It's an ugly workaround... but it seemed a bit weird to "claim" that jpype-stubs namespace for a single jar.

This however does not mean that you need to ship the JARs along with your code to be able to use stubgenj

yeah, this is exactly where this conversation started. @ctrueden has a much better approach than what I did here, with jgo acting as the dependency management system (and scyjava acting as the client/importer wrapping jype). So really, the question became how can I deprecate this silly package and use scyjava, while retaining the stubs that I liked from stubgenj. Ideally, this would be extensible ... I'll take a closer look at your Manager.stubgen method to see if it would work for scyjava. thank you for that

If you have a moment, could you just comment on that question above about multiple stubgen users contributing stub-only packages clobbering the other jpype-stub entries with the critical Literal['jarname']?

Thrameos commented 2 years ago

For my own personal code I have never been a big fan of using the lookup by name facilities (JPackage(str), JClass(str)) which is why my first contribution was to hook up to the Python import tool. The use of the jpype.imports generally does much better in that everything is structurally more like Python. Unfortunately, I do not know how this plays with something like stubgenj.

tlambert03 / bioformats_jar

Consider using scyjava? #2