Plugin dependency management

napari / napari

napari: a fast, interactive, multi-dimensional image viewer for python

https://napari.org

BSD 3-Clause "New" or "Revised" License

2.2k stars 421 forks source link

Plugin dependency management #1001

Open sofroniewn opened 4 years ago

sofroniewn commented 4 years ago

❓ Questions and Help

I'm pulling this question from @VolkerH out of #939 so we can have a more focussed discussion on it:

From @VolkerH

Regarding plugins in general, if they add new functionality they will typically bring new dependencies. I believe the napari developers are trying to keep unnecessary dependencies at a minimum (at least that is my impression from interactions with @jni). However, I assume that the plugins have to live in the same python environment as the viewer. I expect some of the highly desirable plugins to come with dependencies such as tensorflow (or tensorflow-gpu), pandas etc.

As the plugins probably can't live in separate python environments, are there any ideas on how to address this? I fear the dependency hell trying to create an environment in which napari and multiple plugins from different groups have to coexist. Not trivial to manage for intermediate python users and almost impossible for end-users.

To some degree, this is the problem that Cellprofiler faces with their plugins as well. There are some high-value plugins (segmentation using unet) but they don't work with the standard binary distribution of Cellprofiler. One has to create an environment in which both Cellprofiler and the plugin run.

Feel free to separate this thought from the thread where plugin ideas are collected - maybe it should have gone elsewehere, I just don't want to create too many new issues.

From @tlambert03

Super good point 👆. Dependency hell will a very real issue with multiple plugins with heavy dependencies. Thanks @VolkerH. Possible that conda could help slightly here, but some users will likely still run into unsolveable dependencies between plugins (and the plugins would also need to have conda installs of course)

From @VolkerH

Just thought of imjoy. The plugin architecture there is based on plugins running in their own process and exchanging data with the main program via rpcs. That has some limitations when it comes to tight integration with the viewer (I'm thinking e.g. of. GUI elements ot direct access to GPU functionality via vispy) but avoids the depedency hell.

From @tlambert03

just thinking out loud... if we did eventually want to go that direction (RPCs and messages) we could, it would just require moving away from pluggy and rolling our own plugin registration system (since pluggy is pretty much built on the assumption that you can import the plugin module at registration time). That's might not be that big of a job though, and we can definitely take what we learn from the pluggy hooks/spec/implementation pattern when doing so... (if it comes to that).

sofroniewn commented 4 years ago

@VolkerH, thanks for surfacing this. @tlambert03 this was something that was discussed during team meetings at quite some length during the early phase of the project and we havn't revisited too much lately. The strong feeling then was to start with a single python environment for performance reasons. Maybe @jni wants to weigh in with more thoughts.

I'll also add that pip is actively working on improving their dependency resolution so that it would be possible to have packages X and Y which depend on different versions of Z run together if I understand correctly. This work is funded by a CZI Essential Open Source Software grant - for more detail see here https://wiki.python.org/psf/Fundable%20Packaging%20Improvements#Finish_dependency_resolver_for_pip

[EDIT: the work pip is doing on the dependency resolver won't actually fix that conflict, but will cause pip to fail earlier and in a more informative way]

neuromusic commented 4 years ago

This is a hard decision to make without clear prioritization of 3rd party plugins. If there are 5 "must have" plugins that have dependency conflicts in their native implementation, then you are left with either building the plugin architecture to handle that OR forcing plugin devs to refactor their code to work in the napari environment.

jni commented 4 years ago

I'm inclined to go towards standard distributions, e.g. Anaconda. Different versions of napari will make available certain standard versions of packages, and that is what plugins must depend on for specific versions of napari. Things like the NEP29 deprecation policy will make this easier to do. imho this is the only way to manage large interdepency chains. I have a significant bias against RPC things, but I guess individual plugins can do what they like... Pretty sure you can create new conda envs from Python, so we can't stop a plugin from doing that if they want to decouple from napari at the cost of performance.

neuromusic commented 4 years ago

Different versions of napari will make available certain standard versions of packages, and that is what plugins must depend on for specific versions of napari.

Do you have any expectation of the cadence of these version changes? Quarterly? Annually? Decadely?

sofroniewn commented 4 years ago

Do you have any expectation of the cadence of these version changes? Quarterly? Annually? Decadely?

I think this will likely change with time and the size, maturity, and needs of the plugin ecosystem. At the beginning I imagine will try and be flexible with respect to the development practices of the community and try and give them the best experience possible.

carsen-stringer commented 4 years ago

forcing plugin devs to refactor their code to work in the napari environment.

I don't see this as a huge hurdle personally, but for me I think it would mostly require making a separate pip package with fewer potentially conflicting dependencies (ideally there is a better solution than making an entirely new package that I don't know about, see my issue here about this).

On a related note, will napari choose a specific deep learning framework to maintain compatibility with (e.g. for 'napari-segmentation'), or will you stay hands off? If a specific framework is chosen I am happy to use it, and maybe I'm optimistic but hopefully other plugin developers will agree to modify their code accordingly to minimize issues for users.

Thanks all for your great work!

sofroniewn commented 4 years ago

I don't see this as a huge hurdle personally, but for me I think it would mostly require making a separate pip package with fewer potentially conflicting dependencies (ideally there is a better solution than making an entirely new package that I don't know about, see my issue here about this).

One definitely want's to avoid the entirely new package scenario. I just weighed in https://github.com/MouseLand/cellpose/issues/28#issuecomment-596153650 with one approach, but I basically see two options.

One is that pip install cellpose could actually contain the napari specific hooks in addition to your GUI so that you could actually just run cellpose right away from napari without anything extra. Upsides of this is that there is just one package to keep track of and maintain. Downsides are that you had to add that napari specific hooks (which will hopefully be minimal) to your main repo which you may not want. Another downside is that cellpose GUI code now needs to at least be install compatible with napari
Another option is that there is cellpose-base which contains just the core algorithmic stuff and no GUI stuff. You then could have a napari-cellpose repo that depended on cellpose-base but contained the napari specific code for connecting to the hooks. Potential downsides are now their are multiple repos that could go out of sync. Another potential downside is that maybe you have a very nice Qt element in the cellpose GUI that you just want to add into the napari GUI (it could say provide access to some custom buttons / sliders etc that only make sense in the context of cellpose, but are so sophisticated that you wouldn't want to autogenerate in a magicGUI sense). Now it is unclear where that code should live. It probably shouldn't be in cellpose-base as that was supposed to be GUI free, so it might need to be duplicated in both cellpose and napari-cellpose.

Those are just some rough thoughts, capturing my state of mind right now, maybe @jni or @tlambert03 has some more nuanced ideas here.

On a related note, will napari choose a specific deep learning framework to maintain compatibility with (e.g. for 'napari-segmentation'), or will you stay hands off? If a specific framework is chosen I am happy to use it, and maybe I'm optimistic but hopefully other plugin developers will agree to modify their code accordingly to minimize issues for users.

I think napari and even if we were to promote slightly more analysis focused, but still generic plugins like a napari-segmentation, would really try and be framework agnostic for as long and as much as possible, and we'd have versions of all the major frameworks available in our standard distributions.

I could imagine though others in the community would want to make more focused plugins say around model sharing etc. where a choice of framework, or of even a particular model network would then allow for really cool things to be done (say in a federated learning case)

jni commented 4 years ago

will napari choose a specific deep learning framework to maintain compatibility with (e.g. for 'napari-segmentation'), or will you stay hands off?

Just fyi the current Anaconda distribution includes both PyTorch, TensorFlow, and Chainer but not MXNet. You can see the full package list here:

https://docs.anaconda.com/anaconda/packages/py3.7_osx-64/

It would be kind of cool to outsource our package management to Anaconda and thus support only the packages there, but I expect that will break at the seams pretty quickly. We could come up with our own distribution based on conda-forge, or we could do Anaconda++, starting with Anaconda and adding some extra packages requested by the community.

Anyway, the real answer is that this is all in flux and we are still very much open for feedback from the community about this. I also expect we will have a couple of beta releases, see what plugin developers do with our packaging choices, where it breaks, and iterate from there.

Those are just some rough thoughts, capturing my state of mind right now, maybe @jni or @tlambert03 has some more nuanced ideas here.

Nope. =P But I will say that I agree 100% that it is absolutely a primary design goal of our eventual plugin framework to avoid forcing people to create entirely new packages. If someone has a Python package to do X, it should be extremely minimal effort to make that package a napari plugin. In some cases (see e.g. #263), adding plugins would be done by napari itself, without modification of the source package, though some functionality, such as auto-selecting input and output layers, would be missing. But it could be added e.g. by the community buying into our typing hierarchy. Then a package that (a) has annotated its function with napari types, and (b) has `Framework :: napari`` declared in its Trove classifiers (that doesn't exist yet but I'm optimistic) would automatically be a napari plugin.

jaimergp commented 2 years ago

I'd like to bring attention to this issue again... two years after!

We have moved over to conda packaging for the bundle and many plugins have been migrated to conda-forge. This should alleviate a lot of issues for dependency management and ABI compatibility. If the metadata is correct (and we have tried our best), then conda will complain loudly about it (with enhanced performance and error reporting thanks to mamba).

This doesn't fix the problem where two different plugins with incompatible dependencies can't be co-installed in the same napari installation (a conda environment behind the scenes). There are two scenarios where this can happen:

The metadata is wrong. Oftentimes authors pin their dependencies too strictly for the sake of good CI behaviour / less breakage over time. We have ensured exact pins are not used whenever possible while migrating plugins to conda-forge, but there might be other cases with the same outcome. In this case, conda-forge has a mechanism to fix the metadata, either via repodata-patch or a new build for the same package version. This is something we can control and remediate!
The metadata is correct. Authors chose that specific dependency version because that's the one that works with their code. We can't fix much here, unless we help them change their code to use a canonically accepted version for that specific dependency.

This begs the question: which dependencies (and their versions) would we support? I see mentions of NEP29 and Python support, but in the eyes of the solver, python is just one more node in the dependency tree. I'd like to open a debate about this specific topic, covering these questions:

Which packages should be part of our curated list? Think of numpy, scikit-image, pytorch, qt, tensorflow...
Which policy should we use to define and update these pinnings? We are effectively inheriting conda-forge's pinnings (defined here, for ABI compatibility purposes at build time), but we can further restrict run time pinnings if needed.
How will we communicate this to plugin developers? Guidelines? Strict CI checks? Recommendations in the cookiecutter template?

Speaking of CI checks, we need a way to test the "installability" of our plugin ecosystem across platforms. This can be approximated with conda dry-runs for all napari-* packages.

This list of questions is not exhaustive but it should get us started!

sofroniewn commented 2 years ago

Which packages should be part of our curated list? Think of numpy, scikit-image, pytorch, qt, tensorflow...

I like a curated list like this, and I think that vision aligns with how @jni sees things. As much as possible we should align with the community and follow some standard process

Which policy should we use to define and update these pinnings? We are effectively inheriting conda-forge's pinnings (defined here, for ABI compatibility purposes at build time), but we can further restrict run time pinnings if needed.

I don't really have more to add here.

How will we communicate this to plugin developers? Guidelines? Strict CI checks? Recommendations in the cookiecutter template?

I could imagine some combination of all of these, including maybe surfacing compatibility on the hub cc @neuromusic

This list of questions is not exhaustive but it should get us started!

Yes, thanks!!

neuromusic commented 2 years ago

How will we communicate this to plugin developers?

A Github action that does the checks included in the cookie-cutter template would go a long way.

We might be able to incorporate the outputs into the Plugin Preview Page. Eventually, we could highlight on the napari hub if plugins aren't compatible with the Bundled App (for this or any other reason).