Open matthewfeickert opened 1 year ago
That guideline could be very generic, that e.g. specific things from the domain stack could use the same space, etc, but I definitely see the value of figuring it out first rather than when a handful of packages are already there.
What level of formality should the guideline be? Is this at the level of a SPEC (either a new one or additional information added to SPEC 4)? Or is it something less formal that is just an amendment to the
that we have in the README now?
Establishing criteria that aren't very exclusive will probably be difficult unless there is some (not great) metric on use numbers (if people have great ideas here that prove me wrong I would be very happy!), as I think "core" is already quite a fuzzy term with regards to the libraries that are already on the channel (and I mean that with no disrespect to any of the projects there, just that I think even with a small group of projects the idea of "core" becomes highly subjective based off of your use cases and experience).
What is being a core project for SPEC purposes has been discussed quite a lot both by the SPEC committee as well at the summits (both dev and the domain one), and as far as I recall the domain stacks were also discussed. So, it's not really fuzzy or arbitrary of what ended up in the channel. The channel also has technical (e.g. size) limits, which was also discussed when we started the migration, so those limits should be explored, etc before it's being opened up to a lot more packages that currently have domain stack usage rather than across the ecosystem.
What is being a core project for SPEC purposes has been discussed quite a lot both by the SPEC committee as well at the summits (both dev and the domain one), and as far as I recall the domain stacks were also discussed. So, it's not really fuzzy or arbitrary of what ended up in the channel.
Is this written down anywhere publicly? If not, it should be in a way that is easy to find. If it has been discussed at length then it will hopefully be straightforward for people who did this to summarize what was decided. Also there is a mismatch between what was on the old channel and this one,
so was this intentional that some of those projects aren't on this channel?
so those limits should be explored, etc before it's being opened up to a lot more packages
At the moment we're using 6.6 of 20.0 GB
That amount shouldn't significantly change for the packages shown, and can be brought down by simply changing
(though some of the packages only upload 1 wheel that just overwrites the previous).
If you round up that total to say 10 gigs reserved for core (we can also ask Anaconda Cloud for more storage) then you have the remaining half to be distributed to other packages.
I think it is reasonable to say that additional packages that want to have nightlies distributed now could do so if they are able to show that their wheels are under 1 Gig. I also think it is reasonable to ask Anaconda Cloud for more storage (whoever setup the Anaconda Cloud org would need to do that though).
Is this written down anywhere publicly?
Yes, yes it is very public if I could read: https://scientific-python.org/specs/core-projects/
so was this intentional that some of those projects aren't on this channel?
yes. dipy
is very much a domain package, and I can't exactly recall h5py
, but it's also somewhat very specific. The only one that got migrated, but shouldn't have, is statsmodels
, but as I recall it got moved before the summit and the rest of the packages.
At the moment we're using 6.6 of 20.0 GB
The previous site https://anaconda.org/multibuild-wheels-staging has a limit of 50GB. Is there a hard limit of 20GB these days? I guess part of the onboarding should also be to discuss the storage limits and how projects will allocate these between them.
Is there a hard limit of 20GB these days?
I think no, but that a request needs to be made to Anaconda Cloud for more storage (this is based off my thoughts from https://github.com/scientific-python/upload-nightly-action/issues/30#issuecomment-1671959022). @jarrodmillman as I think you(?) made the Anaconda Cloud organization https://anaconda.org/scientific-python-nightly-wheels/ can you make a request to Anaconda for more storage?
@jarrodmillman A much delayed (sorry, late Summer got too busy) ping on the Anaconda Cloud organization storage limits check.
@jarrodmillman as I'm coming back to this Issue given Issue #45, were you able to check on the Anaconda Cloud organization storage limits?
I asked someone to ask, but never heard back. I am not sure who to contact at Anaconda, but will ask a around.
Separate from which packages like dipy
should be included, there is the issue of the extent to which optional dependencies of scientific python core packages themselves should be included. Continuing from https://github.com/scientific-python/upload-nightly-action/issues/51#issuecomment-1906784016 :
One immediate line we could draw is that this library is needed by a core library for full testing.
If that will be the policy, then we'll need to bring in a lot of libraries indeed. (looking at the extras dependencies of a few libraries e.g. xarray, mpl, etc, the list to include in this channel will be very long). This is not to say that h5py should not be here (I'm +/-0 on it), just that having this as a policy may not be as easy/clear-cut as it seems.
FWIW I don't think SPNW would need to supply all libraries that are optional deps -- just (ideally) all of those optional deps that prevent the SPNW-scoped modules (NumPy, SciPy, pandas, matplotlib, ...) from being fully tested and used by downstream libraries.
h5py
falls in that camp because you must compile it against NumPy 2.0.0dev0 for it to import at all. pytables
is also used by pandas[hdf5]
and compiles against NumPy, I think it would also be good to include here as it falls into the "needs to be added here for full pandas
functionality" category, too. However, many other optional dependencies (e.g., most pure-Python libraries) won't be like that. To me if a primary goal for SPNW is to allow bleeding edge code of SPNW-accepted modules to be tested, the more of that code you actually allow to be tested the better. And to maximize that, some of these libraries need to be supplied somehow, at least until they all release NumPy 2.0-compliant wheels. Then maybe this consideration becomes a bit moot.
I concur; inclusion of wheels should be pragmatic.
I Agree with the pure Python argument. Some rules could be:
apt install
and pip install
And if it's a yes. We could add limits such as:
h5py falls in that camp because you must compile it against NumPy 2.0.0dev0 for it to import at all. pytables is also used by pandas[hdf5] and compiles against NumPy
the need to compile against numpy is a very strong argument and a good rule of thumb here, thanks, and it indeed addresses my initial fear in that comment that we pull in a lot of upstream dependencies that are either pure python or very much outside of the SP community involvement.
Poking here, as it seems there is strong favor to include h5py
, in terms of policy revision material, there's currently this discussion and the heuristics that @tupui laid out in https://github.com/scientific-python/upload-nightly-action/issues/30#issuecomment-1906846126. Can the @scientific-python/spec-steering-committee advise on what's the next step forward here? Would this be a discussion the Steering Committee needs to have in a meeting? A PR to update the SPEC Core Projects page? Something else?
I'm not sure when this happened, but the avialable storage that we have on https://anaconda.org/scientific-python-nightly-wheels/ got doubled from 20 GB to 40 GB:
So we're only using about 1/4th of our total storage at the moment. :+1:
I'm not sure when this happened, but the avialable storage that we have on https://anaconda.org/scientific-python-nightly-wheels/ got doubled from 20 GB to 40 GB:
I was corresponding to Anaconda about our project needs, and they generously doubled our storage while we conclude that conversation.
I would love to see more projects included, but I know there is some concern for adding more before we get more space. Given that they increased our storage to 40 GB, can we safely accommodate more projects? @matthewfeickert Could we add h5py, pytables, awkward, awkward-cpp, uproot, and shapely for now? (There may be others. This is just the list that immediately came to mind quickly scanning this discussion. Feel free to suggest other packages that I may have inadvertently overlooked.)
It may be easier to justify increasing our space allocation if we are using more of the 40GB to demonstrate there is an actual need for more space. If we don't get more space, we can always explain to new projects that the space constraints are the limiting factor for us going forward. But I am hopeful Anaconda will agree to vastly increasing our storage quota.
Regardless, the steering committee will discuss this during our March 5th meeting.
+1 to move forward with this list now.
Could we add h5py, pytables, awkward, awkward-cpp, uproot, and shapely for now? (There may be others. This is just the list that immediately came to mind quickly scanning this discussion. Feel free to suggest other packages that I may have inadvertently overlooked.
Yes! :rocket: I think that I have all the admin priviliges necessary to be able to do this (?) but if not I'll ping you @jarrodmillman. I think I also understand the workflow needed as I setup https://anaconda.org/scikit-hep-nightly-wheels and got awkward-cpp
and awkward
working up there with @jpivarski.
I'll try to get to all of these issues before tomorrow.
Could we add ... pytables
@jarrodmillman pytables doesn't have a request Issue open at the moment. If they would like to upload, can you have them open up an Issue so that we can track the setup process?
Feel free to suggest other packages that I may have inadvertently overlooked.
Should we also add:
Running list for me to track the status of groups getting onboarded:
(check off once have successfully uploaded wheels)
h5py
https://github.com/scientific-python/upload-nightly-action/issues/51#issuecomment-1960373356awkward
, awkward-cpp
, uproot
https://github.com/scientific-python/upload-nightly-action/issues/29#issuecomment-1960383256
awkward-cpp
awkward
uproot
shapely
https://github.com/scientific-python/upload-nightly-action/issues/63#issuecomment-1960393895dipy
https://github.com/scientific-python/upload-nightly-action/issues/45#issuecomment-1961736482sunpy
https://github.com/scientific-python/upload-nightly-action/issues/50#issuecomment-1961748520contourpy
https://github.com/contourpy/contourpy/issues/362#issuecomment-1962771358PyTables
https://github.com/PyTables/PyTables/issues/1115#issuecomment-1962825660PyWavelets
https://github.com/scientific-python/upload-nightly-action/issues/75#issuecomment-1986144872pyarrow
https://github.com/apache/arrow/issues/40216#issuecomment-2018437734cython
https://github.com/scientific-python/upload-nightly-action/issues/80#issuecomment-2145636316pydata-sphinx-theme
https://github.com/scientific-python/upload-nightly-action/issues/82pillow
https://github.com/scientific-python/upload-nightly-action/issues/84pyproj
https://github.com/scientific-python/upload-nightly-action/issues/87#issuecomment-2254354375ipykernel
https://github.com/scientific-python/upload-nightly-action/issues/110 sympy
https://github.com/scientific-python/upload-nightly-action/issues/111python-flint
https://github.com/scientific-python/upload-nightly-action/issues/111h5py is up :+1:
@larsoner It sounds like it would make sense to include pytables. Do you want to work with them? I am also happy to open an issue / PR if you prefer. Any other pandas dependencies we should consider adding (e.g., pyarrow)?
@matthewfeickert Let's invite dipy and sunpy since they asked. So far we are still looking good for storage and we should take advantage of the extra space (especially since we have requested more).
Yeah pyarrow was the other one that come to mind for me. Feel free to open an issue for tables and ping me, I haven't looked at their infrastructure much
How about https://github.com/contourpy/contourpy? We ran into issues with it when testing scikit-image with numpy 2 nightly wheels: https://github.com/scikit-image/scikit-image/pull/7288
Any other matplotlib dependencies that we should be considering at this point?
Even with the recent additions of awkward, awkward-cpp, shapely, h5py, and dipy, we are currently still in good shape (using 10.2 GB of our 40.0 GB quota).
@matthewfeickert - can we lift your table into a super short readme or other file in the repo and close this issue? Right now the policy, I feel, can be summarized in: community python libraries and their key dependencies in the scientific python ecosystem.
SGTM. I'm at a conference atm but happy to have this closed and can follow up with anything else later.
Originally posted by @bsipocz (referencing @jarrodmillman) in https://github.com/scientific-python/upload-nightly-action/issues/29#issuecomment-1666269259