arm64 not able to detect CUDA

While trying to run the weaviate helm chart with the text2vec-transformers on Jetson Xavier NX with the latest JetPack I got this from the nvidia-container engine:

k logs -f transformers-inference-845fd6bf68-cznmv
INFO:     Started server process [19]
INFO:     Waiting for application startup.
INFO:     CUDA_PER_PROCESS_MEMORY_FRACTION set to 1.0
INFO:     CUDA_CORE set to cuda:0
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 734, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 610, in __aenter__
    await self._router.startup()
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 713, in startup
    handler()
  File "/app/app.py", line 75, in startup_event
    vec = Vectorizer(model_dir, cuda_support, cuda_core, cuda_per_process_memory_fraction,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vectorizer.py", line 49, in __init__
    self.vectorizer = HuggingFaceVectorizer(model_path, cuda_support, cuda_core, cuda_per_process_memory_fraction, model_type, architecture, direct_tokenize)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vectorizer.py", line 121, in __init__
    self.model.to(self.cuda_core)
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2556, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

ERROR:    Application startup failed. Exiting.

That k3s node is absolutely working and I do doubt that the version of torch is not being able to use the tegra libraries.

Tried couple of ld libs configs, but without luck:

    envconfig:
      # enable for CUDA support. Your K8s cluster needs to be configured
      # accordingly and you need to explicitly set GPU requests & limits below
      enable_cuda: true

      # only used when CUDA is enabled
      nvidia_visible_devices: all
      nvidia_driver_capabilities: compute,utility

      # only used when CUDA is enabled
      #ld_library_path: /usr/local/nvidia/lib64
      #ld_library_path: /usr/local/cuda/lib64
      ld_library_path: /usr/lib/aarch64-linux-gnu/tegra

nvdp-nvidia-device-plugin is running without issues:

I0326 05:19:27.039676       1 main.go:154] Starting FS watcher.
I0326 05:19:27.040261       1 main.go:161] Starting OS watcher.
I0326 05:19:27.042011       1 main.go:176] Starting Plugins.
I0326 05:19:27.042112       1 main.go:234] Loading configuration.
I0326 05:19:27.042985       1 main.go:242] Updating config with default resource matching patterns.
I0326 05:19:27.044077       1 main.go:253]
Running with config:
{
  "version": "v1",
  "flags": {
    "migStrategy": "none",
    "failOnInitError": true,
    "nvidiaDriverRoot": "/",
    "gdsEnabled": false,
    "mofedEnabled": false,
    "plugin": {
      "passDeviceSpecs": false,
      "deviceListStrategy": [
        "envvar"
      ],
      "deviceIDStrategy": "uuid",
      "cdiAnnotationPrefix": "cdi.k8s.io/",
      "nvidiaCTKPath": "/usr/bin/nvidia-ctk",
      "containerDriverRoot": "/driver-root"
    }
  },
  "resources": {
    "gpus": [
      {
        "pattern": "*",
        "name": "nvidia.com/gpu"
      }
    ]
  },
  "sharing": {
    "timeSlicing": {}
  }
}
I0326 05:19:27.044153       1 main.go:256] Retreiving plugins.
W0326 05:19:27.046699       1 factory.go:31] No valid resources detected, creating a null CDI handler
I0326 05:19:27.047010       1 factory.go:107] Detected non-NVML platform: could not load NVML library: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0326 05:19:27.048690       1 factory.go:107] Detected Tegra platform: /sys/devices/soc0/family has 'tegra' prefix
I0326 05:19:27.050143       1 server.go:165] Starting GRPC server for 'nvidia.com/gpu'
I0326 05:19:27.057261       1 server.go:117] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
I0326 05:19:27.072596       1 server.go:125] Registered device plugin for 'nvidia.com/gpu' with Kubelet

Shall I build new image with the JetPack included in order to make torch to detect the CUDA ?

weaviate / t2v-transformers-models

arm64 not able to detect CUDA #77