Closed akshaysubr closed 3 weeks ago
The github GPU CI would probably be simplest, no? But it doesn't seem to be available yet: https://resources.github.com/devops/accelerate-your-cicd-with-arm-and-gpu-runners-in-github-actions/
We would be happy to pay the costs. But we would want to think carefully about what GPU tests to run as part of CI in order to avoid blowing up the bill.
Hello, This is Burak from Ubicloud team. I'd be happy to answer any questions you might have regarding our runners. You can find pricing info here.
@aterrel do you know a good contact at GitHub, who could provide more details on their offering?
@aktech would you be able to share more about the cirun / Quansight approach?
@aktech would you be able to share more about the cirun / Quansight approach?
Thanks for the ping @jakirkham If you have a cloud account in one of the supported clouds you can spinup pretty cheap gpu runners with cirun, the service itself is free for open source so you'll be paying the per second runner cost to the chosen cloud (AWS is usually the best in my experience).
You can also use spot instances on aws to reduce the cost to further down. The scverse folks were able to reduce cost to about 1 cent per run with aws spot instances: https://github.com/scverse/anndata/issues/1067#issuecomment-1709802780 (of course it depends on the time taken on each run), this is just for perspective.
My opinion might be biased towards cirun (being the founder), feel free to explore and chose what works best for you guys. I am happy to help/support to make it work (just need a cloud account access), if you happen to chose cirun.
š Sample run of this repo on GPU runner via cirun.io: https://github.com/aktech/zarr-python/actions/runs/10304821277/job/28524202276#step:4:1
Hello all š
I recently setup a GPU CI for scikit-learn. We use the GitHub GPU runner. I don't think we had to do anything to be able to use them in terms of joining a beta program or some such. We did this because we spent a few months working on getting this going on cirun, but somehow never got it done. We worked on it on and off but didn't get to the finish line. Getting the GitHub GPU CI going was quick enough that we manged to "get it done".
The workflow is defined in this workflow https://github.com/scikit-learn/scikit-learn/blob/main/.github/workflows/cuda-ci.yml. It is triggered by applying a "CUDA CI" label to a PR. The label is removed by a separate workflow: https://github.com/scikit-learn/scikit-learn/blob/main/.github/workflows/cuda-label-remover.yml. The reason to split the workflow is permissions.
I also wrote up a blog post describing what we did/what we did wrong/etc. However it isn't quite done yet. I'll link you to it anyway https://hackmd.io/f68r4NjHSvO0tvchb61Z0g?edit - I think in particular the part about where to click in the UI on github.com is useful. The rest is better learned from the workflow files from the repo.
Thanks folks for the input here.
This is set up now using a github action runner for the time being. Over the next few months, we'll measure usage and decide on a long term arrangement.
Thanks Joe! š
Opening this issue to try and figure out how to set up GPU CI. The new
Buffer
,NDBuffer
andBufferProtocol
abstractions allow adding GPU support but the current blocker for https://github.com/zarr-developers/zarr-python/pull/1967 is the lack of GPU CI.A few options are avaialble:
A very short non-exhaustive survey on how other libraries handle GPU CI:
The main question is which option is best suited to zarr-python and if we need to pay for cloud cycles, how can that be done as an organization?
cc @jakirkham @rabernat @jhamman