scientific-python / upload-nightly-action

This action is used to upload nightly builds of your package.
https://anaconda.org/scientific-python-nightly-wheels
BSD 3-Clause "New" or "Revised" License
7 stars 10 forks source link

Establish guideline for packages that can upload to the scientific-python nightly channel #30

Open matthewfeickert opened 1 year ago

matthewfeickert commented 1 year ago

We probably want to add some more guidance about what we can include, but we can sort things out as we go.

I really would prefer if the guidelines are figured out first than starting to add random packages (==non core, and core dependencies), then ending up to need to say no for others later when we run into technical limitations, e.g. not enough space, etc. That guideline could be very generic, that e.g. specific things from the domain stack could use the same space, etc, but I definitely see the value of figuring it out first rather than when a handful of packages are already there.

Originally posted by @bsipocz (referencing @jarrodmillman) in https://github.com/scientific-python/upload-nightly-action/issues/29#issuecomment-1666269259

matthewfeickert commented 1 year ago

That guideline could be very generic, that e.g. specific things from the domain stack could use the same space, etc, but I definitely see the value of figuring it out first rather than when a handful of packages are already there.

What level of formality should the guideline be? Is this at the level of a SPEC (either a new one or additional information added to SPEC 4)? Or is it something less formal that is just an amendment to the

https://github.com/scientific-python/upload-nightly-action/blob/a2ba178f650e68bd6f9d8e2b7d59d16ffd257c29/README.md?plain=1#L39-L45

that we have in the README now?

Establishing criteria that aren't very exclusive will probably be difficult unless there is some (not great) metric on use numbers (if people have great ideas here that prove me wrong I would be very happy!), as I think "core" is already quite a fuzzy term with regards to the libraries that are already on the channel (and I mean that with no disrespect to any of the projects there, just that I think even with a small group of projects the idea of "core" becomes highly subjective based off of your use cases and experience).

bsipocz commented 1 year ago

What is being a core project for SPEC purposes has been discussed quite a lot both by the SPEC committee as well at the summits (both dev and the domain one), and as far as I recall the domain stacks were also discussed. So, it's not really fuzzy or arbitrary of what ended up in the channel. The channel also has technical (e.g. size) limits, which was also discussed when we started the migration, so those limits should be explored, etc before it's being opened up to a lot more packages that currently have domain stack usage rather than across the ecosystem.

matthewfeickert commented 1 year ago

What is being a core project for SPEC purposes has been discussed quite a lot both by the SPEC committee as well at the summits (both dev and the domain one), and as far as I recall the domain stacks were also discussed. So, it's not really fuzzy or arbitrary of what ended up in the channel.

Is this written down anywhere publicly? If not, it should be in a way that is easy to find. If it has been discussed at length then it will hopefully be straightforward for people who did this to summarize what was decided. Also there is a mismatch between what was on the old channel and this one,

old-channel

so was this intentional that some of those projects aren't on this channel?

so those limits should be explored, etc before it's being opened up to a lot more packages

At the moment we're using 6.6 of 20.0 GB

storage

That amount shouldn't significantly change for the packages shown, and can be brought down by simply changing

https://github.com/scientific-python/upload-nightly-action/blob/a2ba178f650e68bd6f9d8e2b7d59d16ffd257c29/.github/workflows/remove-wheels.yml#L18-L19

(though some of the packages only upload 1 wheel that just overwrites the previous).

If you round up that total to say 10 gigs reserved for core (we can also ask Anaconda Cloud for more storage) then you have the remaining half to be distributed to other packages.

I think it is reasonable to say that additional packages that want to have nightlies distributed now could do so if they are able to show that their wheels are under 1 Gig. I also think it is reasonable to ask Anaconda Cloud for more storage (whoever setup the Anaconda Cloud org would need to do that though).

matthewfeickert commented 1 year ago

Is this written down anywhere publicly?

Yes, yes it is very public if I could read: https://scientific-python.org/specs/core-projects/

bsipocz commented 1 year ago

so was this intentional that some of those projects aren't on this channel?

yes. dipy is very much a domain package, and I can't exactly recall h5py, but it's also somewhat very specific. The only one that got migrated, but shouldn't have, is statsmodels, but as I recall it got moved before the summit and the rest of the packages.

matthewfeickert commented 1 year ago

Following PR #33 the storage for core packages has dropped by 1.6 GB (so good call on suggesting that @bsipocz):

core-storage-use

mattip commented 1 year ago

At the moment we're using 6.6 of 20.0 GB

The previous site https://anaconda.org/multibuild-wheels-staging has a limit of 50GB. Is there a hard limit of 20GB these days? I guess part of the onboarding should also be to discuss the storage limits and how projects will allocate these between them.

matthewfeickert commented 1 year ago

Is there a hard limit of 20GB these days?

I think no, but that a request needs to be made to Anaconda Cloud for more storage (this is based off my thoughts from https://github.com/scientific-python/upload-nightly-action/issues/30#issuecomment-1671959022). @jarrodmillman as I think you(?) made the Anaconda Cloud organization https://anaconda.org/scientific-python-nightly-wheels/ can you make a request to Anaconda for more storage?

matthewfeickert commented 1 year ago

@jarrodmillman A much delayed (sorry, late Summer got too busy) ping on the Anaconda Cloud organization storage limits check.

matthewfeickert commented 1 year ago

@jarrodmillman as I'm coming back to this Issue given Issue #45, were you able to check on the Anaconda Cloud organization storage limits?

jarrodmillman commented 1 year ago

I asked someone to ask, but never heard back. I am not sure who to contact at Anaconda, but will ask a around.

larsoner commented 10 months ago

Separate from which packages like dipy should be included, there is the issue of the extent to which optional dependencies of scientific python core packages themselves should be included. Continuing from https://github.com/scientific-python/upload-nightly-action/issues/51#issuecomment-1906784016 :

One immediate line we could draw is that this library is needed by a core library for full testing.

If that will be the policy, then we'll need to bring in a lot of libraries indeed. (looking at the extras dependencies of a few libraries e.g. xarray, mpl, etc, the list to include in this channel will be very long). This is not to say that h5py should not be here (I'm +/-0 on it), just that having this as a policy may not be as easy/clear-cut as it seems.

FWIW I don't think SPNW would need to supply all libraries that are optional deps -- just (ideally) all of those optional deps that prevent the SPNW-scoped modules (NumPy, SciPy, pandas, matplotlib, ...) from being fully tested and used by downstream libraries.

h5py falls in that camp because you must compile it against NumPy 2.0.0dev0 for it to import at all. pytables is also used by pandas[hdf5] and compiles against NumPy, I think it would also be good to include here as it falls into the "needs to be added here for full pandas functionality" category, too. However, many other optional dependencies (e.g., most pure-Python libraries) won't be like that. To me if a primary goal for SPNW is to allow bleeding edge code of SPNW-accepted modules to be tested, the more of that code you actually allow to be tested the better. And to maximize that, some of these libraries need to be supplied somehow, at least until they all release NumPy 2.0-compliant wheels. Then maybe this consideration becomes a bit moot.

stefanv commented 10 months ago

I concur; inclusion of wheels should be pragmatic.

tupui commented 10 months ago

I Agree with the pure Python argument. Some rules could be:

  1. Needed to fully test core packages or/and are widely used in the community for testing purposes
  2. Need to be compiled
  3. Are difficult to do so: need more than a simple apt install and pip install
  4. Take a lot of time: >5 mins?

And if it's a yes. We could add limits such as:

  1. x GB at most
  2. x platforms at most
bsipocz commented 10 months ago

h5py falls in that camp because you must compile it against NumPy 2.0.0dev0 for it to import at all. pytables is also used by pandas[hdf5] and compiles against NumPy

the need to compile against numpy is a very strong argument and a good rule of thumb here, thanks, and it indeed addresses my initial fear in that comment that we pull in a lot of upstream dependencies that are either pure python or very much outside of the SP community involvement.

matthewfeickert commented 9 months ago

Poking here, as it seems there is strong favor to include h5py, in terms of policy revision material, there's currently this discussion and the heuristics that @tupui laid out in https://github.com/scientific-python/upload-nightly-action/issues/30#issuecomment-1906846126. Can the @scientific-python/spec-steering-committee advise on what's the next step forward here? Would this be a discussion the Steering Committee needs to have in a meeting? A PR to update the SPEC Core Projects page? Something else?

matthewfeickert commented 9 months ago

I'm not sure when this happened, but the avialable storage that we have on https://anaconda.org/scientific-python-nightly-wheels/ got doubled from 20 GB to 40 GB:

image

So we're only using about 1/4th of our total storage at the moment. :+1:

stefanv commented 9 months ago

I'm not sure when this happened, but the avialable storage that we have on https://anaconda.org/scientific-python-nightly-wheels/ got doubled from 20 GB to 40 GB:

I was corresponding to Anaconda about our project needs, and they generously doubled our storage while we conclude that conversation.

jarrodmillman commented 9 months ago

I would love to see more projects included, but I know there is some concern for adding more before we get more space. Given that they increased our storage to 40 GB, can we safely accommodate more projects? @matthewfeickert Could we add h5py, pytables, awkward, awkward-cpp, uproot, and shapely for now? (There may be others. This is just the list that immediately came to mind quickly scanning this discussion. Feel free to suggest other packages that I may have inadvertently overlooked.)

It may be easier to justify increasing our space allocation if we are using more of the 40GB to demonstrate there is an actual need for more space. If we don't get more space, we can always explain to new projects that the space constraints are the limiting factor for us going forward. But I am hopeful Anaconda will agree to vastly increasing our storage quota.

Regardless, the steering committee will discuss this during our March 5th meeting.

tupui commented 9 months ago

+1 to move forward with this list now.

matthewfeickert commented 9 months ago

Could we add h5py, pytables, awkward, awkward-cpp, uproot, and shapely for now? (There may be others. This is just the list that immediately came to mind quickly scanning this discussion. Feel free to suggest other packages that I may have inadvertently overlooked.

Yes! :rocket: I think that I have all the admin priviliges necessary to be able to do this (?) but if not I'll ping you @jarrodmillman. I think I also understand the workflow needed as I setup https://anaconda.org/scikit-hep-nightly-wheels and got awkward-cpp and awkward working up there with @jpivarski.

I'll try to get to all of these issues before tomorrow.

matthewfeickert commented 9 months ago

Could we add ... pytables

@jarrodmillman pytables doesn't have a request Issue open at the moment. If they would like to upload, can you have them open up an Issue so that we can track the setup process?

Feel free to suggest other packages that I may have inadvertently overlooked.

Should we also add:

matthewfeickert commented 9 months ago

Running list for me to track the status of groups getting onboarded:

(check off once have successfully uploaded wheels)

larsoner commented 9 months ago

h5py is up :+1:

jarrodmillman commented 9 months ago

@larsoner It sounds like it would make sense to include pytables. Do you want to work with them? I am also happy to open an issue / PR if you prefer. Any other pandas dependencies we should consider adding (e.g., pyarrow)?

jarrodmillman commented 9 months ago

@matthewfeickert Let's invite dipy and sunpy since they asked. So far we are still looking good for storage and we should take advantage of the extra space (especially since we have requested more).

larsoner commented 9 months ago

Yeah pyarrow was the other one that come to mind for me. Feel free to open an issue for tables and ping me, I haven't looked at their infrastructure much

jarrodmillman commented 9 months ago

How about https://github.com/contourpy/contourpy? We ran into issues with it when testing scikit-image with numpy 2 nightly wheels: https://github.com/scikit-image/scikit-image/pull/7288

Any other matplotlib dependencies that we should be considering at this point?

Even with the recent additions of awkward, awkward-cpp, shapely, h5py, and dipy, we are currently still in good shape (using 10.2 GB of our 40.0 GB quota).

jarrodmillman commented 9 months ago

See https://github.com/PyTables/PyTables/issues/1115

jarrodmillman commented 9 months ago

See https://github.com/apache/arrow/issues/40216

jarrodmillman commented 9 months ago

See https://github.com/contourpy/contourpy/issues/362

bsipocz commented 1 month ago

@matthewfeickert - can we lift your table into a super short readme or other file in the repo and close this issue? Right now the policy, I feel, can be summarized in: community python libraries and their key dependencies in the scientific python ecosystem.

matthewfeickert commented 1 month ago

SGTM. I'm at a conference atm but happy to have this closed and can follow up with anything else later.