rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 3 forks source link

Switch Conda packages to `.conda` (instead of `.tar.bz2`) #98

Open jakirkham opened 2 weeks ago

jakirkham commented 2 weeks ago

Currently RAPIDS builds and publishes packages using .tar.bz2. However this was revamped in the newer .conda packages. The make a few important changes:

  1. Use a top-level uncompressed .zip file
  2. Means all .conda packages can be renamed to .zip and extracted
  3. As .zip files allow random access, specific items can be retrieved
  4. Metadata is placed in the top-level
    1. Contains info about the .conda metadata itself
    2. Package metadata (how it was built, dependencies needed, etc.)
    3. Compressed package contents (currently using Zstd)

This can help with solve times (no need to decompress .tar.bz2 to find metadata first). It can also help with download sizes & times (we noticed ~30% reduction in size of the legacy cudatoolkit when transitioning conda-forge)


To make the change, we would simply need to update our condarc at build time to include

conda_build:
  pkg_format: '2'

This can also be done with the following command:

conda config --set conda_build.pkg_format '2'

Also as pointed out by James in comment ( https://github.com/rapidsai/ci-imgs/pull/176#pullrequestreview-2266384030 ) there are a few places where .tar.bz2 shows up in the RAPIDS org. Of these noted that some were:

Decided to skip the above. In the event we want to revitalize one of these projects, likely this is one of many changes that will be needed.

Of the remainder saw:

Made a best effort to update these. With some one-off projects, they don't run CI yet; so, we can likely move ahead without those

jakirkham commented 2 weeks ago

It is worth noting that XGBoost Conda packages in RAPIDS are already built in the .conda format (for example. So we are already making some use of these today

jakirkham commented 2 weeks ago

Think this is all we need: https://github.com/rapidsai/ci-imgs/pull/176

Would be good if others can confirm though

jameslamb commented 2 weeks ago

think this is all we need

We'll also need to update any places where *.tar.bz2 or similar is being used to list conda packages. See https://github.com/rapidsai/ci-imgs/pull/176#pullrequestreview-2266384030

jakirkham commented 2 weeks ago

Have made a best effort to submit PRs where appropriate. Included notes on this above. Would be good if someone else can recheck whether we got everything we deem relevant

jameslamb commented 2 weeks ago

I think you got everything, and I've reviewed them all: https://github.com/rapidsai/ci-imgs/pull/176#issuecomment-2318892498

jakirkham commented 2 weeks ago

Think we have what we need in for PR: https://github.com/rapidsai/ci-imgs/pull/176

AIUI gpuci-tools should no longer be used. Though James mentioned offline there might still be a few spots where it is used

If we do want to update gpuci-tools, we have PR: https://github.com/rapidsai/gpuci-tools/pull/37

Will double check next week (after the long weekend) if we want to include that PR too

The others still open we concluded don't need to merge as they are to projects planned for archival