iterative / terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
https://registry.terraform.io/providers/iterative/iterative/latest/docs
Apache License 2.0
288 stars 27 forks source link

re-fix `task` NVIDIA drivers #606

Closed DavidGOrtega closed 1 year ago

DavidGOrtega commented 2 years ago

Coming from discord

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I have confirmed it too.

DavidGOrtega commented 2 years ago

duplicated of https://github.com/iterative/cml/issues/1065#issuecomment-1157607462

casperdcl commented 2 years ago

Actually... won't TPI task need this?

casperdcl commented 1 year ago

@dacbd might want to add a basic test for new GPU models to https://github.com/iterative/cml-playground :)

dacbd commented 1 year ago

Going to close this, I have now tested with Ampere, Turing, Volta, and Pascal architectures with no issues. The CML created instances where able to perform work and run nvidia-smi with no additional or special setup/considerations.