Open jameslamb opened 3 months ago
After merging https://github.com/rapidsai/legate-boost/pull/176, I just did the following to check the code paths for stable releases.
Pushed a new release tag like this:
git checkout main
git pull upstream main
git tag -a v24.08.00 -m 'v24.08.00'
git push upstream 'v24.08.00'
That triggered this build: https://github.com/rapidsai/legate-boost/actions/runs/11616162673
The build succeeded, and it uploaded a v24.08.00
to the main
label on the legate
channel 🎉
ref: https://anaconda.org/legate/legate-boost/files
Created an environment and ran the tests.
docker run \
--rm \
--gpus "2,3" \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/ci-conda \
bash
# --override-channels just to be sure we're not cheating with channels
# configured globally in the env where I ran this
conda create \
--name test-legate-boost \
--yes \
--override-channels \
-c legate \
-c legate/label/experimental \
-c conda-forge \
legate-boost=24.08 \
python=3.11
(NOTICE: I'm on a system with CUDA, and it pulled in the _gpu
builds over everything without me having to specify that!)
Then installed test dependencies and ran legate-boost
's tests.
source activate test-legate-boost
conda install \
-c conda-forge \
--yes \
'hypothesis>=6' \
'matplotlib>=3.9' \
'nbconvert>=7.16' \
'notebook>=7' \
'pytest>=7,<8' \
'seaborn>=0.13' \
'xgboost>=2.0'
./ci/run_pytests_gpu.sh
Those failed with fatal errors 😭
plugins: hypothesis-6.115.6, anyio-4.6.2.post1
collecting ... [0 - 7f1166dbb740] 0.000000 {5}{numa}: mems_allowed: ret=-1 errno=1 mask= count=64
[0 - 7f1166dbb740] 0.000000 {6}{gpu}: Failed to allocate GPU memory of size 29360128000
Fatal Python error: Aborted
Current thread 0x00007f1166dbb740 (most recent call first):
File "/opt/conda/envs/test-legate-boost/lib/python3.1/site-packages/legate/core/__init__.py", line 109 in <module>
Will look into it.
Those failed with fatal errors 😭
I missed the "Failed to allocate GPU memory" before! Looked at nvidia-smi
and saw that there were other processes using the same GPUs as me (I ran this on a shared machine).
Re-ran with smaller memory requirements and saw the tests pass. This is working 😎
The docs deployment failed on that tag build: https://github.com/rapidsai/legate-boost/actions/runs/11616162673/job/32349185474
Like this:
Tag "v24.08.00" is not allowed to deploy to github-pages due to environment protection rules. The deployment was rejected or didn't satisfy other protection rules.
I've asked ops to help take a look.
Alright ops fixed the docs builds... the issue was just that deploy-github-pags-on-new-tags was not turned on in the repo settings.
I pushed another tag (v24.08.01
) and saw docs published successfully!!
https://github.com/rapidsai/legate-boost/actions/runs/11619624304/job/32360219823
... but with the wrong version
Put up #177 to fix that.
Alright, trying this again 😂
Merged #177, saw that successfully build packages and deploy docs: https://github.com/rapidsai/legate-boost/actions/runs/11632125999
I just pushed another tag, like this:
git checkout main
git pull upstream main
git tag -a v24.08.02 -m 'v24.08.02'
git push upstream 'v24.08.02'
That triggered this build: https://github.com/rapidsai/legate-boost/actions/runs/11632477453
Hopefully, we'll see all the CI jobs succeed and docs published with the correct version (24.08.02
).
grrrr why did that not work.
The deployment says it succeeded: https://github.com/rapidsai/legate-boost/actions/runs/11632477453/job/32396301167
And I can see the docs-building job installed the version we wanted.
legate-boost 24.08.02 cuda12_py312_0_cpu file:///tmp/local-conda-packages
But the docs still have the wrong version in them (yes I cleared my browser cache):
My next theory was "ok, maybe the wrong artifact is being pulled". Checked the logs from the build job:
With the provided path, there will be 1 file uploaded
Artifact name is valid!
Root directory input is valid!
Beginning upload of artifact content to blob storage
Uploaded bytes 4334606
Finished uploading artifact content to blob storage!
SHA256 hash of uploaded artifact zip is 9ed2d0645717aaeb74a7eebed1549b8ae80977098020d3220697fcf9d5be6551
Finalizing artifact upload
Artifact github-pages.zip successfully finalized. Artifact ID 2133838991
Artifact github-pages has been successfully uploaded! Final size is 4334606 bytes. Artifact ID is 2133838991
Artifact download URL: https://github.com/rapidsai/legate-boost/actions/runs/11632477453/artifacts/2133838991
Compared that to the deploy job:
Fetching artifact metadata for "github-pages" in this workflow run
Found 4 artifact(s)
Creating Pages deployment with payload:
{
"artifact_id": 2133838991,
"pages_build_version": "f0f5ac033092d849efb362d8bf66dad9243ec331",
"oidc_token": "***"
}
Created deployment for f0f5ac033092d849efb362d8bf66dad9243ec331, ID: f0f5ac033092d849efb362d8bf66dad9243ec331
Getting Pages deployment status...
Reported success!
Those IDs exactly match.
I clicked on the github-pages
artifact in the summary from the run that was triggered by the tag: https://github.com/rapidsai/legate-boost/actions/runs/11632477453
https://github.com/rapidsai/legate-boost/actions/runs/11632477453/artifacts/2133838991
And opened it up locally (it's index.html
)... it has the correct version (24.08.02)! So it's not like the version is wrong in the HTML we're producing.
I just merged https://github.com/rapidsai/legate-boost/pull/178, which triggered this build: https://github.com/rapidsai/legate-boost/actions/runs/11673382407/workflow
... it failed again with workflow syntax errors 🙃
The workflow is not valid. .github/workflows/build.yaml (Line: 42, Col: 15): Unexpected symbol: '"tag"'. Located at position 22 within expression: github.event_name == "tag" || inputs.deploy_docs == true
I'll put up another PR fixing that, and testing the syntax directly on the PR. Sorry for all the noise getting this last part working 😭
Alright does look like the changes from #180 did successfully lead to the docs NOT being redeployed on a merge to main
.
Pushed a new tag:
git checkout main
git pull upstream main
git tag -a v24.08.06 -m 'v24.08.06'
git push upstream 'v24.08.06'
That triggered this build: https://github.com/rapidsai/legate-boost/actions/runs/11710469244
which... ALSO skipped the deployment 😫 😫 😫 😫
tried triggering with workflow dispatch, checking the "deploy docs?" box... also did not deploy: https://github.com/rapidsai/legate-boost/actions/runs/11710994183/job/32619034518
Description
For #101 , we want to publish
legate-boost
conda packages to thelegate
channel (https://anaconda.org/legate/repo).This captures the work to do that.
Benefits of this work
legate-boost
Acceptance Criteria
legate-boost
(with the-
)legate
conda channelApproach
Patterns that might be borrowed from RAPIDS libraries, including:
pyproject.toml
instead ofsetup.py
scikit-build-core
as a build backend instead ofscikit-build
rapids-cmake
to manage dependenciesrapids-dependency-file-generator
to keep different lists of dependencies consistentFor an example of this, see how
cuvs
conda packages are built:Notes
legate-core
source, for reference: https://github.com/nv-legate/legate.core