runs-on / runner-images-for-aws

GitHub Action Runner images for AWS
17 stars 5 forks source link

Minimal Nvidia GPU image #5

Open ruffsl opened 2 months ago

ruffsl commented 2 months ago

Opening for visibility and collaboration. It would be nice to include a minimal Nvidia GPU AMI for use with RunsOn. This PR currently modifies the default templates re-used by the gpu RELEASE_DIST example to slim down the resulting AMI, and change the source AMI to leverage NVIDIA GPU-Optimized AMI:

Because of the source AMI's imposed constraints, this does necessitate that building the custom GPU AMI then requires the use of a nvidia GPU instance_type to kick off the packer process. Perhaps this could sidestepped by manually installing nvidia drivers and nvidia container runtime, but is something I've not yet bothered to reverse engineer.

View the commit log for some notable subtle patches required to accommodate for apt-lock bocking because of the Nvidia source AMI's use of bashrc to bootstrap the drivers on first boot, and disabling the AWS CLI installation given it conflicts with the pre-installed version that ships with the Nvidia source AMI. The Nvidia source AMI is also initialized from a larger drive (128GB), so our child AMI also (unforntally) requires a bump minimum HDD size, larger than the current default large option in the RunsOn disk size of 80GB. Thus some editing of the RunsOn cloud formation setting were also needed. This may be another motivation to manually install the nvidia drivers, rather than rely on the source AMI.

Context:

crohr commented 2 months ago

Thanks @ruffsl! I think you forgot to push configure-apt-mock.sh?

I will experiment with this and also try @samayala22 approach, since it would be nice to be able to simply extend the base RunsOn images with the additional drivers.

ruffsl commented 2 months ago

I think you forgot to push configure-apt-mock.sh?

I forget how/where the templates populate from, but it's already included in the source tree here:

https://github.com/runs-on/runner-images-for-aws/blob/432c20fe44c72369c27f87c6c93e68ca9c64966c/releases/ubuntu22/x64/images/ubuntu/scripts/build/configure-apt-mock.sh#L1-L6

I will experiment with this and also try @samayala22 approach, since it would be nice to be able to simply extend the base RunsOn images with the additional drivers.

That could be a more optimal and customizable approach, as I've updated the OP commit to note about the disk usage.