reanahub / reana-commons

REANA common utilities and schemas
https://reana-commons.readthedocs.io/
MIT License
2 stars 39 forks source link

Set lower bounds only instead of pinning 'yadage' extra dependencies #322

Closed matthewfeickert closed 2 years ago

matthewfeickert commented 2 years ago

Hi. :wave: For reana-client v0.8.0 the install_requires requires reana-commons[yadage,snakemake]>=0.8.0

install_requires = [
    "click>=7",
    "cwltool==3.1.20210628163208",
    "jsonpointer>=2.0",
    "reana-commons[yadage,snakemake]>=0.8.0,<0.9.0",
    "tablib>=0.12.1,<0.13",
    "werkzeug>=0.14.1",
]

and reana-commons v0.8.0 has all the dependencies for the 'yadage' extra pinned.

https://github.com/reanahub/reana-commons/blob/352562345a68c2c391b786ec6ebae973fe64f65d/setup.py#L36

So there is no way to install reana-client>=0.8.0 at the moment without also requiring these pinned versions.

This is problematic as adage v0.10.2, and yadage-schemas v0.10.7 were made as part of a collection of patch releases so that a patch release v0.1.9 of recast-atlas could be made. However, these pinned versions means that there is no compatible way for pip to solve an install that asks for both a new recast-atlas and reana-client — c.f. https://github.com/recast-hep/recast-atlas/pull/86.

Can the dependencies for the yadage extra please be changed to lower bounds only?

$ git diff
diff --git a/setup.py b/setup.py
index 2198edb..77900b0 100755
--- a/setup.py
+++ b/setup.py
@@ -33,7 +33,7 @@ extras_require = {
     "docs": ["Sphinx>=1.4.4", "sphinx-rtd-theme>=0.1.9",],
     "tests": tests_require,
     "kubernetes": ["kubernetes>=11.0.0,<12.0.0",],
-    "yadage": ["adage==0.10.1", "yadage==0.20.1", "yadage-schemas==0.10.6",],
+    "yadage": ["adage>=0.10.1", "yadage>=0.20.1", "yadage-schemas>=0.10.6",],
     "snakemake": [get_snakemake_pkg()],
     "snakemake_reports": [get_snakemake_pkg("[reports]")],
 }

(cc @lukasheinrich)

matthewfeickert commented 2 years ago

@audrium @mvidalgarcia @tiborsimko, as you were all involved in the initial pinning of these dependencies in PR #264 and PR #270 and https://github.com/reanahub/reana-workflow-engine-yadage/issues/192, do you have any objections to the lower bound proposal? If not, I have a branch on my fork with the above change that is passing CI and is ready to PR.

tiborsimko commented 2 years ago

@matthewfeickert The reason for pinning was to have the very same Yadage version on the REANA client side and on REANA the cluster side. The reana-workflow-engine-yadage component obviously works with a precise yadage/adage version combination only, and we wanted to be sure to have the very same versions on the client side as well, e.g. for workflow validation.

That said, there are several possible solutions:

  1. If adage/yadage packages use semantic versioning strictly, so that we can rely on a fact that 0.20.N will have the same API for any N, then we can perhaps pin in r-commons in a manner like ">=0.20,<0.21" and then have a strict pin "==0.20.1" in r-w-e-yadage only. Will this do?

  2. Otherwise, using two different Python environments, one for RECAST and one for REANA, should solve this nicely as well, albeit in a manner that is not very convenient to users.

  3. Otherwise another solution would be to release reana-client as an AppImage (see https://github.com/reanahub/reana-client/issues/390) so that there would be no Python dependency clash either. This would obviously work if users aim to use reana-client CLI API and not Python API.

matthewfeickert commented 2 years ago

The reason for pinning was to have the very same Yadage version on the REANA client side and on REANA the cluster side. The reana-workflow-engine-yadage component obviously works with a precise yadage/adage version combination only, and we wanted to be sure to have the very same versions on the client side as well, e.g. for workflow validation.

Thanks for the super fast reply, @tiborsimko! (Though we can of course follow up more when it isn't the weekend for you — I wasn't expecting a reply now, just getting thoughts out of my head.)

Maybe we should back up a bit first so that I understand where the REANA team is coming from. Are reana-client and reana-commons libraries or applications? I was under the impression they were libraries, but the above makes it sound like you're not deploying them in a custom environment with a lock file or something so I'm now unsure. It is probably best to understand how the REANA team sees them so that I don't make wrong assumptions.

That being said, with regards to the proposed solutions:

  1. If adage/yadage packages use semantic versioning strictly, so that we can rely on a fact that 0.20.N will have the same API for any N, then we can perhaps pin in r-commons in a manner like ">=0.20,<0.21" and then have a strict pin "==0.20.1" in r-w-e-yadage only.

I can pretty confidently say that the API won't be changing in any patch releases. No one is perfect with SemVer, but @lukasheinrich and I take it pretty seriously. However, this discussion of placing upper bounds that are effectively pins and then actually pinning is again confusing if these are supposed to be libraries. So I would like to avoid any solution that imposes these sorts of restrictions in setup.py or setup.cfg.

  1. Otherwise, using two different Python environments, one for RECAST and one for REANA, should solve this nicely as well, albeit in a manner that is not very convenient to users.

This would mean that recast-atlas would have to drop support for the reana-client backend/integration as reana-client would be uninstallable in the same environment (as it is now). I'd like to avoid this as much as possible.

  1. Otherwise another solution would be to release reana-client as an AppImage (see installation: non-python-venv based installation way reana-client#390) so that there would be no Python dependency clash either. This would obviously work if users aim to use reana-client CLI API and not Python API.

I think this again comes down to library vs. application (seeming to imply application). I don't have super strong feelings here, and I also admittedly have very limited use with AppImage so some of the following questions might not make sense (apologies if so), but:

Failing example: ```console [feickert@lxplus8s14 ~]$ cd /tmp/ [feickert@lxplus8s14 tmp]$ curl -sLO https://github.com/probonopd/Emacs.AppImage/releases/download/continuous/Emacs-27.1.glibc2.16-x86_64.AppImage [feickert@lxplus8s14 tmp]$ chmod a+x Emacs-27.1.glibc2.16-x86_64.AppImage [feickert@lxplus8s14 tmp]$ ./Emacs-27.1.glibc2.16-x86_64.AppImage /tmp/feickert/.mount_Emacs-AglMxT/usr/bin/emacs: /tmp/feickert/.mount_Emacs-AglMxT/usr/lib/x86_64-linux-gnu/libgnutls.so.30: version `GNUTLS_3_6_9' not found (required by /lib64/libglib-2.0.so.0) ```

In general, I apologize if I'm rehashing any discussions that have been had already in the past or if I'm missing anything super obvious. My main goal is just to make sure that recast-atlas can work with REANA, so anything that moves that forward is great. :+1:

tiborsimko commented 2 years ago

Are reana-client and reana-commons libraries or applications?

I'd say more the latter, as far as workflow engine versions such as CWL or Snakemake or Yadage are concerned. This is because we don't support more than one Yadage version on the cluster side right now. Example: if REANA supports Yadage 0.20, but there is a lot of development in Yadage and say Yadage 0.21, 0.22 and 0.23 are born and use different standard of feature set or API each, then surely we won't be able to run such workflows on the REANA cluster. Hence it wouldn't make sense to allow these "incompatible" Yadage versions to be used in the client environment either. We cannot "promise" that such a combination would work. Hence more an app than a library.

Ideally, if we would like to turn reana-client into a full-scale library, then the REANA cluster would have to have a way of running Yadage 0.20 workflows, Yadage 0.21 workflows, Yadage 0.22 workflows, Yadage 0.23 workflows, e.g. via different engine images loaded on the cluster side for the execution -- and then we could allow users to use any Yadage version in the client they wish and reana-client detect which Yadage version users would like to use and instruct the cluster to "load" the appropriate version accordingly. But this is something that would require quite some development. Right now, the REANA cluster support only one CWL version, only one Snakemake version, and only one Yadage version, that are all pinned in reana-commons.

I can pretty confidently say that the API won't be changing in any patch releases.

Great, then I think if we go for ">=0.20,<0.21" then this might be the fastest solution to the problem at hand!

As for your concern:

[...] placing upper bounds that are effectively pins

Yes, but I see it as a good thing. E.g. we did not pin click dependency in REANA 0.7.x release series in the main repository that is having Helm charts in themaint-0.7 branches; we only used "click>=7" so the pin was without upper boundary. Now that there is Click version 8 out, and is not compatible with version 7, our CI actually does not pass anymore in maint-0.7 branches out-of-the-box. Granted, this is easy to fix should we need to do some more serious REANA 0.7.x release series work still, but for the best "automatic reproducibility of builds" in the future, having upper pins helps to keep things green. This can be important with current REANA 0.8, upcoming REANA 0.9, etc. We might have several "release series" to work with, and having them all "buildable" out of the box in the future simplifies the maintenance as well. (Alongside the "promise" reasons mentioned above which workflow versions are supported in each REANA release series.)

In my eyes, by declaring that REANA 0.8.x release series depends on Yadage 0.20.y release series, we promise we won't break anything mutually, and our builds will "just work" even when launched many years later. Then if Yadage 0.21 comes and is compatible with REANA 0.8, we can bump the upper pin as well. And if 0.22 comes and is not compatible anymore, we can release REANA 0.9. In this way the matrix which REANA version and which Yadage version are compatible may be nicely evolving together, for the benefit of both users (see the "promise" story above) and developers (see the "CI" story above).

So if we opt for semantic versioning of pins and we liberate the current ==0.20.1 hard pin into a looser >=0.20,0.21 pin, I would rather see it as a glass-half-full rather than glass-half-empty kind of situation.

AppImage [...] Is this something that the REANA team was already planning to do?

Yes, we were thinking about it, because some users had a wish to run reana-client is some pretty old pre-existing Python environments, such as CMS open data old VM, or other users were confused when mixing with Rucio virtual environment... Having a statically linked binary would help to use reana-client in any Python environment.

I have actually experimented with creating an AppImage for reana-client in the past, and it was working well and gave a usable image. However I have not polished it for the public release. But we could revive this plan now... It would solve nicely the reana-client CLI API issue, but not the Python API issue though.