CUDA 11.3? - Githubissues

Linux-cpp-lisp commented 2 years ago

Hi all,

Thanks for your work packaging CUDA in an easy way for system76 machines!

PyTorch has moved up to CUDA 11.3 (see https://pytorch.org/get-started/locally/); does system76 expect to keep these releases up to date with NVIDIA releases, or should I install directly from NVIDIA if I need newer CUDA?

Thanks!

mraxilus commented 2 years ago

Better yet, why not 11.6 since that's what is included with system76-driver-nvidia by default anyway?

mmstick commented 2 years ago

It's recommended to build CUDA software in a devcontainer with Docker or Podman.

Linux-cpp-lisp commented 2 years ago

Better yet, why not 11.6 since that's what is included with system76-driver-nvidia by default anyway?

@mraxilus is that true? Maybe that's just on the latest 22.04... definitely wasn't the case for me before on 21.10.

mraxilus commented 2 years ago

@mraxilus is that true? Maybe that's just on the latest 22.04... definitely wasn't the case for me before on 21.10.

I was mistakenly using my nvidia-smi CUDA version instead of that reported by nvcc --version. The latest on System76's packages is still 11.2.

mraxilus commented 2 years ago

It's recommended to build CUDA software in a devcontainer with Docker or Podman.

If that's so, then why provide the system76-cu* packages at all? I don't want to have to spin up docker containers just to access my GPU in a script, or test out features from a library with CUDA capabilities.

gully commented 2 years ago

:wave: thanks for supporting these convenient cuda installs! Question---

I'm encountering this same friction point. I go to install pytorch but choices for prebuilt binaries are either cuda 10.2 or 11.3. I can get 11.1 or 11.2 from system76, but not 11.3. I tried installing pytorch from source, but that's a whole other issue.

I'd be open to a docker or podman route, but it's currently at odds with my development workflow, and would add some more mental overhead to navigate. A cuda 11.3 fix would slot right in to my existing workflow.

If anyone finds this and has a worked solution of setting up cuda 11.3 manually on pop!_os can you share? I may try it and share if I find a workaround...

mmstick commented 2 years ago

Dev containers are the way to go

gully commented 2 years ago

Ok, my workaround is to default back to cuda 10.2. Both System76 and pytorch have binaries for 10.2, so it just works out-of-the-box. I tried it out on my particular pytorch application and it appears to have worked.

I suspect you're right that in the long term dev containers make it easier for portable and reproducible environments. For some reason dev containers still haven't taken off in scientific computing, or at least my sub-community of it. Is there a migration guide available or planned? I found this NVIDIA website that seems streamlined. Is that the dev container workflow ya'll would recommend?

If I get around to trying it out, I'd be open to writing one of those "support" guides that you have on your documentation. I adore that your docs are all open source! So cool.

NickleDave commented 1 year ago

@gully (and anyone else this helps) to run pytorch in a dev container, I followed this tutorial:
https://blog.roboflow.com/nvidia-docker-vscode-pytorch/
but ended up needing to install nvidia-docker following a comment on this gist:
https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e specifically https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e?permalink_comment_id=4186634#gistcomment-4186634

sudo apt install nvidia-docker2
set the option no-cgroups = true in /etc/nvidia-container-runtime/config.toml (not control.toml in spite of what the comment says)
run with flags as that comment suggests; e.g. to test, docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

system76 / cuda

CUDA 11.3? #23