jina-ai / clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
https://clip-as-service.jina.ai
Other
12.48k stars 2.07k forks source link

Converting the model to onnx introduces errors #908

Open geekchen007 opened 1 year ago

geekchen007 commented 1 year ago

We found that there is an error between the ONNX model and the pytorch(pth), with a cosine distance of approximately 5%

open_clip.create_model_and_transforms('ViT-B-32', pretrained='laion2b_e16', cache_dir) reference: https://github.com/LAION-AI/CLIP_benchmark/blob/main/clip_benchmark/models/open_clip.py

https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32-laion2b_e16/visual.onnx

ZiniuYu commented 1 year ago

Hi @geekchen007,

Thank you for bringing this issue to our attention. To better understand the problem, could you please let us know what device you are using to run the model? Our pytorch model uses mixed-precision, which means it utilizes both float32 and float16. However, we only use float16 on the GPU, which may result in some precision loss.

Looking forward to hearing back from you.

geekchen007 commented 1 year ago

torch==1.13.1 onnx==1.13.1 onnxruntime===1.11.1

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 | 0 Tesla V100-SXM2

ZiniuYu commented 1 year ago

Sorry there is something not clear in my previous response. What I meant is that we use fp16 with onnx on gpu (which you may observe the difference), and fp32 with onnx on cpu.

Are you running in cpu mode or gpu mode? Could you please share the scripts/data you found the discrepancy?