Closed billbrod closed 2 years ago
I posted on the US-RSE slack about this, and got the following advice:
It looks like lots of people have access to their own machines-with-GPUs and then link them up with a CI system in some way. So we could try and get that working? We shouldn't use our dedicated GPU on the HPC, but could ask them about setting up some low-priority jobs for it. Alternatively, try and get some through Simons.
Hi @billbrod I am the creator of Cirun.io, "GPU" and "CI" caught my eye.
FWIW I'll share my two cents. I created a service for problems like these, which is basically running custom machines (including GPUs) in GitHub Actions: https://cirun.io/
It is used in multiple open source projects needing GPU support like the following:
https://github.com/pystatgen/sgkit/ https://github.com/qutip/qutip-cupy
It is fairly simple to setup, all you need is a cloud account (AWS or GCP) and a simple yaml file describing what kind of machines you need and Cirun will spin up ephemeral machines on your cloud for GitHub Actions to run. It's native to GitHub ecosystem, which mean you can see logs/trigger in the Github's interface itself, just like any Github Action run.
Also, note that Cirun is free for Open source projects. (You only pay to your cloud provider for machine usage)
closed by #139
We want to make sure that our code runs on GPUs with very little overheard.
Currently, there are two steps for that:
Make sure everything runs on GPU in same manner. See
metamer.py
,steerable_pyramid_freq.py
,pooling.py
, andventral_stream.py
for my preferred way, but basically: none of our synthesis methods nor models should set the device anywhere:.to
method, which moves all tensor attributes over to given device/dtype, and then all of its methods should work regardless of which device they're on. This can be done by using things liketorch.ones_like
; if a new tensor needs to be created (and you can't usetorch.ones_like
or something like it), its device should be explicitly set to that of that method's input. If method has no input, check one of the tensor attributes.Figure out how to make Travis CI work with CUDA. There's an open issue on this, so it might not be trivial, but they link an existing project which has a
.travis.yml
file we could try modifying.