Enable CUDA Heterogeneous Memory Management (HMM) on GPU instances

Is your feature request related to a problem? Please describe. CUDA Heterogeneous Memory Management (HMM) enables GPU code to access all data allocated by a process. That is, users do no longer have to explicitly allocate memory as GPU-only memory, and copy memory back and forth, to use the GPU.

This makes it much much much easier for programmers to use GPUs because they don't need to learn or know anything about memory management to be able to launch GPU work. It also makes it much much easier to use the GPU with Python because the GPU can directly access Python objects, chase pointers, etc.

To learn more, see, e.g., Simplifying GPU application development with Heterogeneous Memory Management

Describe the solution you'd like I'd like HMM to be enabled for GPU instances.

Describe alternatives you've considered We currently use AWS and Azure to run our notebooks, because those have it enabled, but colab is much easier for beginners to use and access, so I'd like to migrate our GPU teaching content to it, to help as many users as possible learn how to use GPUs.

Additional context

Requirements for HMM:
- Install the NVIDIA open-source driver, which is already a few years old. It seems that currently, colab is still installing the NVIDIA closed-source proprietary driver. Note that the open-source driver will soon become the default as announced here, but until then, installing it is opt-in.
- Install a sufficiently new OS. For example, Ubuntu 24.04 or Ubuntu 23.10. A sufficently new Linux Kernel is required for HMM.

googlecolab / colabtools

Enable CUDA Heterogeneous Memory Management (HMM) on GPU instances #4748