[MAINT] - Test GPU configurations for AWS and update relevant documentation

A recent deployment of Nebari 2024.03.03 on an AWS with a g4dx.xlarge GPU profile has led to an issue where, despite the CUDA-related packages appearing correctly configured, torch.cuda.is_available()still returns False. This indicates a failure to recognize the GPU Cuda drivers. Additionally, the nvidia-smi command is not found, which suggests potential issues with NVIDIA driver integration or installation (handled non-implicitly by the existence of gpu: true in the configuration settings)

Steps to resolve this issue:

Evaluate the current logic behind our drive installation and document it;
Test a GPU deployment on AWS with the appropriate configuration and document it;
Test GPU drives
Test a Pytorch environment and run a basic computation, and document it;

Current configuration profile:

  - display_name: G4 GPU Instance 1x
    description: 4 cpu / 16GB RAM / 1 Nvidia T4 GPU (16 GB GPU RAM)
    kubespawner_override:
      image: quay.io/nebari/nebari-jupyterlab-gpu:2024.3.3
      cpu_limit: 4
      cpu_guarantee: 3
      mem_limit: 16G
      mem_guarantee: 10G
      extra_pod_config:
        volumes:
        - name: "dshm"
          emptyDir:
            medium: "Memory"
            sizeLimit: "2Gi"
      extra_container_config:
        volumeMounts:
        - name: "dshm"
          mountPath: "/dev/shm"
      node_selector:
        "dedicated": "gpu-1x-t4"

Additional details

Relevant issue #2392

nebari-dev / nebari

[MAINT] - Test GPU configurations for AWS and update relevant documentation #2400

Additional details