scipp / ess

European Spallation Source facility bespoke, neutron scattering tools based on scipp.
https://scipp.github.io/ess/
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

Splitting up the `ess` package #133

Closed SimonHeybrock closed 8 months ago

SimonHeybrock commented 2 years ago

Motivation

There are a number of reasons for (strongly) considering splitting up the ess package:

Problems/questions

arm61 commented 2 years ago

I guess that ess could be a "meta" package that contains just dependencies?

jl-wynen commented 2 years ago

Could some of these problems be solved by using a live-at-head type of model? That is, release often but do not use semantic versioning. There could be a single version number that gets incremented every time we publish. This would mostly remove the communication channel that was mentioned, though. And I don't know how well PyPI and Conda can deal with a large number of packages. (But that becomes relevant anyway.)

jl-wynen commented 2 years ago

Here is a simplified overview of the dependencies this would introduce.

util is a package with miscellaneous stuff like logging. Would this be its own package that is useless outside of ess?

Note that the dependency graph is not a tree! That is, it does not neatly decompose into a hierarchy where packages on the same level (like powder and sans) are independent. But they have users in common which require those packages to have the same dependencies. E.g. heimdal can only work if both powder and sans use identical versions of scipp, etc.

Here are some scenarios that can happen:

Dependencies -2

SimonHeybrock commented 2 years ago

If we have a ess-sans package, should we also have neutron-sans, instead of a monolithic scippneutron?

SimonHeybrock commented 2 years ago

change can not be encapsulated: (almost) all downstream packages must update wfm dep when updating scipp. There is no indication in the old wfm package or scipp, maintainers must know this or fix failing builds.

We have been recommending updating the entire env (instead of individual packages) for years. Therefore I would expect this to be an issue? Or are you referring to sth. else?

SimonHeybrock commented 2 years ago

@jl-wynen wrote:

Do we test downstream packages for an upstream update? E.g. wfm has an update, do we test all instruments?

  • Yes: We require an update of all packages in case the API breaks, i.e. nothing gained vs a single repo.

Comments:

jl-wynen commented 2 years ago

change can not be encapsulated: (almost) all downstream packages must update wfm dep when updating scipp. There is no indication in the old wfm package or scipp, maintainers must know this or fix failing builds.

We have been recommending updating the entire env (instead of individual packages) for years. Therefore I would expect this to be an issue? Or are you referring to sth. else?

I am referring to changing the downstream package's code and releasing them.

* Let us be clear about single package vs. single repo. The testing of downstream packages you point out are a good reason for a single _repo_ (maybe treating `ess` as a sort of monorepo) since it may make such a test setup easier. But this does not imply that it should feed into a single _package_.

How would this work in practice? If we test instrument packages with the dev version of wfm, we cannot release them until wfm has been released. At which point we have to remember to bump the minimum version in the instrument package dependencies. So in effect, we would be handling this like a single big package.

SimonHeybrock commented 2 years ago

How would this work in practice? If we test instrument packages with the dev version of wfm, we cannot release them until wfm has been released. At which point we have to remember to bump the minimum version in the instrument package dependencies. So in effect, we would be handling this like a single big package.

I am not sure in detail, but had in mind to automatically run downstream tests when making a change in, say, WFM. These tests would maybe not be marked as "required". Instead we can:

jl-wynen commented 2 years ago

If it is an intentional break we can go and update the downstream packages.

Which can then not be tested until wfm is released. I think we have to either always test with a released version of always with a dev version. We cannot mix the two, because only one of them will pass.

SimonHeybrock commented 2 years ago

Which can then not be tested until wfm is released.

Maybe we need to publish pre-releases, etc.?

SimonHeybrock commented 1 year ago

Coming back to the example of updates to a wfm package that will require changes in downstream packages. I feel that this will be manageable somehow and not worse than the situation we are in now.

Sure, we can conjure up scenarios where the wfm package break things that downstream packages depend on (and that cannot be replaced). But is this situation really different from what we are doing now? If we make a breaking change in Scipp or ScippNexus we are arguably in an even worse place, and still it works out in practice.

In addition, we might consider instrument packages as "applications" (as opposed to libraries). This implies that pinning versions (or upper bounds) may be acceptable. That is, if the wfm version is pinned we have control and flexibility over when we perform a refactor/update for compatibility with a new version.

SimonHeybrock commented 1 year ago

In my eyes the remaining problem to be solved is how to handle dependencies and releases. Whether mono-repo or not, we want to test downstream packages when an upstream package (such as wfm) changes. Setting up testing and CI for this will take some work, but (without having tried) it seems relatively obvious how to to this. What is not obvious is how to handle releases:

Say we make a change to wfm (and all CI passed and we can merge into main). There are two cases:

  1. No downstream changes. We want to release only wfm. This is relatively simple (even though we cannot really use git tags for version numbers).
  2. Downstream changes were required in the PR. We want to release wfm and potentially multiple downstream packages (such as reflectometry). How can we automate this, such that:
    • We do not have tedious/repetitive work of doing this manually for multiple packages.
    • We automatically and reliably detect which packages need a release.
    • We do not mangle those changes with other changes in downstream packages. Having multi-purpose releases seems odd.

Is "live at head" the answer?

SimonHeybrock commented 1 year ago

Would it be an solution to not have the instrument/technique packages depend on common packages, but simply include them? That is, when we release a technique package we simply ship it with everything (from the ess-monorepo) that it depends on?

What we have to keep in mind is that someone may need to install multiple instrument/technique packages in the same environment. That is, we either need to keep globally compatible versions, or of "copies" of libraries such as WFM in each technique package, with potential version skew. As we have learned, version skew is bad so this does not seem like the way to go.

jl-wynen commented 1 year ago

Downstream changes were required in the PR. We want to release wfm and potentially multiple downstream packages (such as reflectometry). How can we automate this, such that:

How would this work? Say the wfm workflow finds a breaking change in odin. Do we then fix that on the same branch as the wfm change until everything passes and we can merge into main? At this point, the odin CI would depend on an unreleased version of wfm. So it would have to install that from main instead of PyPI. And this puts us into the same sitatuion we are in now, where all packages have to be compatible at all times. If we don't require a passing odin test, then we have a broken wfm workflow and can't release it until odin is fixed.

And when it comes to an actual release, suppose we have fully compatible packages on main. We first need to release wfm. But then the release of odin takes some time and might even fail for unrelated reasons. Then we have a period when users would have to pin the version of wfm if they want to install odin. This period might be short and not matter, but many users will have no experience with software development...

SimonHeybrock commented 1 year ago

How would this work? Say the wfm workflow finds a breaking change in odin. Do we then fix that on the same branch as the wfm change until everything passes and we can merge into main?

Yes, we would require fixing this. Otherwise there would probably little utility in running a monorepo?

And this puts us into the same sitatuion we are in now, where all packages have to be compatible at all times.

Interesting observation. Indeed, what do we gain from a monorepo over a single big package, apart from (1) running tests more selectively (could be done for big package) and (2) shipping smaller components?

nvaytet commented 1 year ago

I have the feeling that if we want everything to always work together, we should either

  1. have a single repo where all tests are run for every change (basically what we have now), but this might become very difficult to maintain in the future (as per all the comments above) (maybe 'live at main' would help slightly?)
  2. pin all versions of the software we maintain (scipp, scippneutron, scippnexus, plopp, ess-common) in the instrument packages (this is one of the cases where apparently it could be ok to pin, as per the articles that Simon shared about avoiding upper bounds on versions)

About 2., as stated above, many users will not know much about software development and version pinning, but if everything is already pinned for them, they should not have to worry. Basically, what we want is that a user types mamba install ess-odin and that just works, always.

jl-wynen commented 1 year ago

About 2., as stated above, many users will not know much about software development and version pinning, but if everything is already pinned for them, they should not have to worry. Basically, what we want is that a user types mamba install ess-odin and that just works, always.

This could be done via an installer that people can use. (There were requests for that at some point.) So most users would just download the installer which creates an env with all versions pinned and an executable that opens Jupyter.

This doesn't solve the problem for more expert users like, hopefully, ISs or at least IDSs.

SimonHeybrock commented 11 months ago

Has our positive experience with the https://github.com/scipp/copier_template brought us any closer to taking a decision here?

nvaytet commented 11 months ago

Has our positive experience with the https://github.com/scipp/copier_template brought us any closer to taking a decision here?

I asked the same on Slack. I think we should think some more about this. Especially when I tried to recently update ess to the latest scipp, I had to touch diffraction, sans, reflectometry etc...

I was also wondering if we should make a new organisation, so that we get more gh builders for the CI? I have a feeling that we will soon have a bottleneck where we find ourselves waiting for resources, as we make more and more repos. The alternative is of course to get ESS to pay for more builders.

Regarding the scope of our jobs, it makes sense to have everything under the same organisation, because that's where all the code we work on goes. However, if you look at the projects themselves, maybe grouping the non-neutron stuff together might make sense?

But I am now thinking that a package called scippnexus should probably be part of the Scipp organisation.

So maybe the simplest thing to do is indeed to ask ESS to pay for more builders? There is something called a "Team" account, I don't know if it's possible to upgrade Scipp to that? ($4 per user per month, it won't break the bank)

SimonHeybrock commented 11 months ago

I think builders is an independent discussion, and should not have any influence on the technical decision on splitting up the package or not.

nvaytet commented 11 months ago

I don't think I was saying we should maybe not split ess because we may run out of builders.

I was saying that I think we should split, and then was wondering how we could deal with the potential increase in builder use.

SimonHeybrock commented 11 months ago

I think the need for builders scales more directly with the number of contributors, not so much with the number of repos (except for releases, maybe).

SimonHeybrock commented 8 months ago

We have started doing this. Still ongoing but this issue is no longer useful.