fail with status 500: CUDA error: no kernel image is available for execution on the device

Matthieu-Tinycoaching commented 1 year ago

Hi,

I try to reproduce step 2 of the semantic search through wikipedia on my local computer with RTX 3090, and while importing data with the nohup python3 -u import.py & command I got the following error message:

2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1490 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1491 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1492 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1493 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1494 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.925 | INFO     | __main__:import_data_without_crefs:195 - Imported (1495 / 215) – Islamic world with # of paragraphs 7
2022-10-10 16:20:59.926 | INFO     | __main__:import_data_without_crefs:195 - Imported (1497 / 216) – Multiverse with # of paragraphs 4
2022-10-10 16:20:59.926 | INFO     | __main__:import_data_without_crefs:195 - Imported (1498 / 216) – Multiverse with # of paragraphs 4
2022-10-10 16:20:59.926 | INFO     | __main__:import_data_without_crefs:195 - Imported (1499 / 216) – Multiverse with # of paragraphs 4
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-10 16:21:00.482 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I follow the guidelines provided at https://weaviate.io/developers/weaviate/current/tutorials/semantic-search-through-wikipedia.html#step-2-import-the-dataset-and-vectorize-the-content and set the following files to work on local computer:

1 / docker-compose.yml:

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.15.3
    ports:
    - 8080:8080
    restart: on-failure:0
    volumes:
      - /media/matthieu/HDD_4T01/weaviate:/var/lib/weaviate
      # - /home/matthieu/weaviate:/var/lib/weaviate
    depends_on:
      - loadbalancer
    environment:
      TRANSFORMERS_INFERENCE_API: 'http://loadbalancer:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers'
      # ENABLE_MODULES: ''
      CLUSTER_HOSTNAME: 'node1'
  ##
  # Load in all GPUs
  ##
  loadbalancer:
    image: nginx
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - t2v-transformers-01-001
      - t2v-transformers-01-002
      - t2v-transformers-01-003
      - t2v-transformers-01-004
      - t2v-transformers-01-005
      - t2v-transformers-01-006
      - t2v-transformers-01-007
      - t2v-transformers-01-008
      - t2v-transformers-01-009
      - t2v-transformers-01-010
      - t2v-transformers-01-011
      - t2v-transformers-01-012

  t2v-transformers-01-001:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0' # <== set the GPU to use. 0 = 1st GPU, 1 = 2nd, etc
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-002:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-003:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-004:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-005:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-006:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-007:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-008:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-009:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-010:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-011:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
  t2v-transformers-01-012:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: 'all'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
...

2 / nginx.conf:

user  nginx;
worker_processes  1;
error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    upstream modules {
        least_conn;
        server t2v-transformers-01-001:8080;
        server t2v-transformers-01-002:8080;
        server t2v-transformers-01-003:8080;
        server t2v-transformers-01-004:8080;
        server t2v-transformers-01-005:8080;
        server t2v-transformers-01-006:8080;
        server t2v-transformers-01-007:8080;
        server t2v-transformers-01-008:8080;
        server t2v-transformers-01-009:8080;
        server t2v-transformers-01-010:8080;
        server t2v-transformers-01-011:8080;
        server t2v-transformers-01-012:8080;
    }

    include                 /etc/nginx/mime.types;
    default_type            application/octet-stream;
    keepalive_timeout       65;
    client_body_buffer_size 128M;
    client_max_body_size    128M;

    server {
        listen 8080 default_server;
        listen [::]:8080 default_server;
        location / {
            proxy_set_header                    Host $http_host;
            proxy_set_header                    X-Url-Scheme $scheme;
            proxy_set_header                    X-Real-IP $remote_addr;
            proxy_set_header                    X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass                          http://modules;
            proxy_buffering                     off;
            proxy_read_timeout                  3600;
            proxy_redirect                      off;
        }
    }
}

My GPU card is well configured with CUDA 11.6 and cuDNN and the following command nvidia-smi returns:

Mon Oct 10 16:39:30 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:21:00.0  On |                  N/A |
|  0%   55C    P8    54W / 390W |   1272MiB / 24576MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2389      G   /usr/lib/xorg/Xorg                 38MiB |
|    0   N/A  N/A      2935      G   /usr/bin/gnome-shell              118MiB |
|    0   N/A  N/A      9316      G   /usr/lib/xorg/Xorg                423MiB |
|    0   N/A  N/A      9462      G   /usr/bin/gnome-shell              116MiB |
|    0   N/A  N/A     10657      G   ...veSuggestionsOnlyOnDemand       71MiB |
|    0   N/A  N/A     12204      G   ...rbird/259/thunderbird-bin      178MiB |
|    0   N/A  N/A     18483      G   ...AAAAAAAAA= --shared-files       47MiB |
|    0   N/A  N/A     22141      G   ...142126277284136387,131072      203MiB |
|    0   N/A  N/A     68713      G   ...RendererForSitePerProcess       68MiB |
+-----------------------------------------------------------------------------+

bobvanluijt commented 1 year ago

Hi @Matthieu-Tinycoaching – I think this is because the memory on the GPU is full. You can lower the # of models loaded onto the GPU to make this work.

Matthieu-Tinycoaching commented 1 year ago

Hi @bobvanluijt thanks for your feedback!

Even if I reduce the number of models loaded onto the GPU to 2, I got both errors regularly when importing:

2022-10-17 14:03:59.222 | DEBUG    | __main__:handle_results:169 - fail with status 500: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-10-17 14:03:59.222 | DEBUG    | __main__:handle_results:169 - send POST request: Post "http://loadbalancer:8080/vectors": read tcp 192.168.16.5:39354->192.168.16.4:8080: read: connection reset by peer

Lastly, I even removed nginx load-balancer and still got regularly those issues with the following docker-compose.yml file:

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.13.2
    ports:
    - 8080:8080
    restart: on-failure:0
    volumes:
      - /media/matthieu/HDD_4T01/weaviate:/var/lib/weaviate
    depends_on:
      - t2v-transformers-01-001
    environment:
      TRANSFORMERS_INFERENCE_API: 'http://t2v-transformers-01-001:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers'
      CLUSTER_HOSTNAME: 'node1'

  t2v-transformers-01-001:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    restart: always
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: '0' # <== set the GPU to use. 0 = 1st GPU, 1 = 2nd, etc
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
...

bobvanluijt commented 1 year ago

Hmm - interesting. @antas-marcin does this issue ring a bell?

weaviate / semantic-search-through-wikipedia-with-weaviate

fail with status 500: CUDA error: no kernel image is available for execution on the device #8