Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

TopTea1 commented 1 year ago

Hi, I'm using the Docker image clip-server:master on CUDA GPU, and when I'm trying to execute the basic example :

from clip_client import Client

c = Client('grpc://0.0.0.0:51000')
r = c.encode(['data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7'])

print(r.shape)

With this config :

jtype: Flow
version: '1'
with:
  port: 51000
executors:
  - name: clip_t
    uses:
      jtype: CLIPEncoder
      metas:
        py_modules:
          - clip_server.executors.clip_torch
      with:
        name: ViT-L-14-336::openai

I have this issue :

jina.excepts.BadServer: request_id: "d1a940f074264c97b10891b885e4c8a8"
status {
  code: ERROR
  description: "RuntimeError(\'Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should 
be the same\')"
  exception {
    name: "RuntimeError"
    args: "Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same"
    stacks: "Traceback (most recent call last):\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py\", line 222, in
process_data\n    result = await self._request_handler.handle(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/request_handling.py\", line
291, in handle\n    return_data = await self._executor.__acall__(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 354, in 
__acall__\n    return await self.__acall_endpoint__(__default_endpoint__, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 401, in 
__acall_endpoint__\n    return await exec_func(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 366, in 
exec_func\n    return await func(self, tracing_context=tracing_context, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/decorators.py\", line 173, in 
arg_wrapper\n    return await fn(executor_instance, *args, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/executors/clip_torch.py\", line 180, in 
encode\n    self._model.encode_image(**batch_data)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/openclip_model.py\", line 64, in 
encode_image\n    return self._model.encode_image(pixel_values)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/open_clip/model.py\", line 182, in encode_image\n    
features = self.visual(image)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/open_clip/transformer.py\", line 304, in forward\n    
x = self.conv1(x)  # shape = [*, width, grid, grid]\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py\", line 463, in forward\n    
return self._conv_forward(input, self.weight, self.bias)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py\", line 459, in 
_conv_forward\n    return F.conv2d(input, weight, bias, self.stride,\n"
    stacks: "RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be 
the same\n"
    executor: "CLIPEncoder"
  }
}
exec_endpoint: "/encode"
target_executor: ""

Have you any ideas to solve this issue ?

Thanks for your help

ZiniuYu commented 1 year ago

Hi @TopTea1 , thanks for reporting this!

This is a known issue and we are fixing it.

TopTea1 commented 1 year ago

Thanks for your feedback

ZiniuYu commented 1 year ago

@TopTea1 You can use our pre-build docker image to get around with the error like this:

jtype: Flow
with:
  port: 51000
executors:
  - name: clip_t
    uses: jinahub+docker://CLIPTorchEncoder/0.8.1
    uses_with:
      name: ViT-L-14-336::openai

TopTea1 commented 1 year ago

Thanks @ZiniuYu, but when I try this version 0.8.1 from dockerhub I have this issue :

jina.excepts.BadServer: request_id: "3732bde0aa46429da0be6f4638c50b08"
status {
  code: ERROR
  description: "RuntimeError(\'CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmStridedBatchedExFix(
handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, 
(void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`\')"
  exception {
    name: "RuntimeError"
    args: "CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmStridedBatchedExFix( handle, opa, opb, 
m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, 
CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`"
    stacks: "Traceback (most recent call last):\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py\", line 219, in
process_data\n    result = await self._data_request_handler.handle(\n"
    stacks: "  File 
\"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/request_handlers/data_request_handler.py\", line 228, 
in handle\n    return_data = await self._executor.__acall__(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 329, in 
__acall__\n    return await self.__acall_endpoint__(__default_endpoint__, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 378, in 
__acall_endpoint__\n    return await exec_func(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 339, in 
exec_func\n    return await func(self, tracing_context=tracing_context, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/decorators.py\", line 153, in 
arg_wrapper\n    return await fn(executor_instance, *args, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/executors/clip_torch.py\", line 140, in 
encode\n    self._model.encode_image(**batch_data)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/openclip_model.py\", line 51, in 
encode_image\n    return self._model.encode_image(pixel_values)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/model.py\", line 591, in 
encode_image\n    return self.visual(image.type(self.dtype))\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/model.py\", line 428, in forward\n  
x = self.transformer(x)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/model.py\", line 353, in forward\n  
x = r(x, attn_mask=attn_mask)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/model.py\", line 322, in forward\n  
x = x + self.ln_attn(self.attention(self.ln_1(x), attn_mask=attn_mask))\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/model.py\", line 317, in attention\n
return self.attn(x, x, x, need_weights=False, attn_mask=attn_mask)[0]\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/activation.py\", line 1167, in 
forward\n    attn_output, attn_output_weights = F.multi_head_attention_forward(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py\", line 5160, in 
multi_head_attention_forward\n    attn_output_weights = torch.bmm(q_scaled, k.transpose(-2, -1))\n"
    stacks: "RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmStridedBatchedExFix( 
handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, 
(void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`\n"
    executor: "CLIPEncoder"
  }
}
exec_endpoint: "/encode"
target_executor: ""

ZiniuYu commented 1 year ago

Hi @TopTea1 , can you try again with jinahub+docker://CLIPTorchEncoder/0.8.1-gpu? What's the output of nvidia-smi?

TopTea1 commented 1 year ago

Hi @ZiniuYu, thanks it's working with this version, I was using a wrong version in my last comment

ZiniuYu commented 1 year ago

Glad to see it works 🍻 You can also give the main branch another try! The problem you met should be fixed now.

TopTea1 commented 1 year ago

I have tested with the new master image, I got this issue :

jina.excepts.BadServer: request_id: "98fc908ccd2944fa8221d5fed3f420f6"
status {
  code: ERROR
  description: "RuntimeError(\'CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmStridedBatchedExFix(
handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, 
(void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`\')"
  exception {
    name: "RuntimeError"
    args: "CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmStridedBatchedExFix( handle, opa, opb, 
m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, 
CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`"
    stacks: "Traceback (most recent call last):\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py\", line 222, in
process_data\n    result = await self._request_handler.handle(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/request_handling.py\", line
291, in handle\n    return_data = await self._executor.__acall__(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 354, in 
__acall__\n    return await self.__acall_endpoint__(__default_endpoint__, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 401, in 
__acall_endpoint__\n    return await exec_func(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 366, in 
exec_func\n    return await func(self, tracing_context=tracing_context, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/decorators.py\", line 173, in 
arg_wrapper\n    return await fn(executor_instance, *args, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/executors/clip_torch.py\", line 194, in 
encode\n    self._model.encode_image(**batch_data)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/openclip_model.py\", line 64, in 
encode_image\n    return self._model.encode_image(pixel_values)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/open_clip/model.py\", line 182, in encode_image\n    
features = self.visual(image)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/clip_server/model/model.py\", line 88, in forward\n   
return super().forward(x)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/open_clip/transformer.py\", line 314, in forward\n    
x = self.transformer(x)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/open_clip/transformer.py\", line 230, in forward\n    
x = r(x, attn_mask=attn_mask)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/open_clip/transformer.py\", line 154, in forward\n    
x = x + self.ls_1(self.attention(self.ln_1(x), attn_mask=attn_mask))\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/open_clip/transformer.py\", line 151, in attention\n  
return self.attn(x, x, x, need_weights=False, attn_mask=attn_mask)[0]\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\", line 1190, in 
_call_impl\n    return forward_call(*input, **kwargs)\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/modules/activation.py\", line 1167, in 
forward\n    attn_output, attn_output_weights = F.multi_head_attention_forward(\n"
    stacks: "  File \"/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py\", line 5160, in 
multi_head_attention_forward\n    attn_output_weights = torch.bmm(q_scaled, k.transpose(-2, -1))\n"
    stacks: "RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmStridedBatchedExFix( 
handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, 
(void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`\n"
    executor: "CLIPEncoder"
  }
}
exec_endpoint: "/encode"
target_executor: ""

Here is the output of my nvidia-smi :

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:18:00.0 Off |                    0 |
| N/A   35C    P0    37W / 250W |   2125MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   30C    P0    26W / 250W |      4MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   52C    P0    65W / 250W |   3515MiB / 16384MiB |     65%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    683662      C   python                           2121MiB |
+-----------------------------------------------------------------------------+

TopTea1 commented 1 year ago

To give more precision, have tested this docker image (https://hub.docker.com/r/jinaai/clip-server) in both comment. In the first try (https://github.com/jina-ai/clip-as-service/issues/873#issuecomment-1341230474) I have used the image with the tag 0.8.1 and in the second comment (https://github.com/jina-ai/clip-as-service/issues/873#issuecomment-1342808181) I used the tag master. The config and the command to start the container that I used are :

cat clip_config.yml | CUDA_VISIBLE_DEVICES=1 docker run -i   -p 51000:51000 -v $HOME/jina/.cache:/home/cas/.cache --gpus all jinaai/clip-server:master -i

And in clip_config.yml :

jtype: Flow
version: '1'
with:
  port: 51000
executors:
  - name: clip_t
    uses:
      jtype: CLIPEncoder
      with:
        name: ViT-L-14-336::openai
      metas:
        py_modules:
          - clip_server.executors.clip_torch

jina-ai / clip-as-service

Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same #873