Open TimPietrusky opened 1 month ago
@michaelfeil do you maybe have another idea on how to get this sorted?
@TimPietrusky The output you posted are not really descriptive for the problem that occures.
The environment variables that are currently usable are not up to date. Here are all the functions that generate env variables in infinity. They are however just generating the defaults, I think there is nothing to do here.
https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/env.py
Runpod-infinity currently uses:
infinity-emb[all,onnxruntime-gpu]==0.0.35
(See: https://github.com/runpod-workers/worker-infinity-embedding/blob/main/builder/requirements.txt)
Does you model run with this version?
@michaelfeil thanks for your quick response.
Sorry for the unusable error messages, this is what we get in our UI. I hopefully get access to more in depth logging or find someone how can actually help out here.
infinity-emb[all,onnxruntime-gpu]==0.0.35
Which version do you recommend? Should we try 0.0.53
?
Yeah, maybe 0.0.53 fixes this? Can you try running infinity from this and 0.0.35 & see if it works? Then try and run the image from runpod locally. if both work, check the UI for additional messages, not the other way around. :)
@michaelfeil awesome, thank you! Will do.
@pandyamarut please let us know when you had time to update this 🙏 Then I can do the testing.
Thanks to @pandyamarut we have updated the version in the worker runpod/worker-infinity-text-embedding:0.0.1-cuda12.1.0
, but it will still not produce the desired outcome when using the same request / env vars as before:
2024-08-15T10:07:37.654523432Z INFO 2024-08-15 10:07:37,650 datasets INFO: PyTorch version config.py:59
2024-08-15T10:07:37.654545312Z 2.5.0.dev20240618+cu121 available.
2024-08-15T10:07:38.125792094Z INFO 2024-08-15 10:07:38,124 infinity_emb INFO: select_model.py:57
2024-08-15T10:07:38.125805034Z model=`jinaai/jina-embeddings-v2-base-de` selected,
2024-08-15T10:07:38.125806844Z using engine=`torch` and device=`None`
2024-08-15T10:07:38.384517276Z INFO 2024-08-15 10:07:38,381 SentenceTransformer.py:189
2024-08-15T10:07:38.384544466Z sentence_transformers.SentenceTransformer
2024-08-15T10:07:38.384548407Z INFO: Use pytorch device_name: cuda
2024-08-15T10:07:38.387090089Z INFO 2024-08-15 10:07:38,384 SentenceTransformer.py:197
2024-08-15T10:07:38.387112810Z sentence_transformers.SentenceTransformer
2024-08-15T10:07:38.387116550Z INFO: Load pretrained SentenceTransformer:
2024-08-15T10:07:38.387119610Z jinaai/jina-embeddings-v2-base-de
2024-08-15T10:07:44.040260125Z WARNING 2024-08-15 10:07:44,038 infinity_emb WARNING: acceleration.py:35
2024-08-15T10:07:44.040299926Z DEPRECATED `INFINITY_DISABLE_OPTIMUM` - setting
2024-08-15T10:07:44.040303436Z optimizations via
2024-08-15T10:07:44.040305626Z BetterTransformer,INFINITY_DISABLE_OPTIMUM is no
2024-08-15T10:07:44.040307746Z longer supported, please use the CLI / ENV for that.
2024-08-15T10:07:44.041460234Z INFO 2024-08-15 10:07:44,040 infinity_emb INFO: sentence_transformer.py:81
2024-08-15T10:07:44.041485354Z Switching to half() precision (cuda: fp16).
2024-08-15T10:07:44.059055655Z --- Starting Serverless Worker | Version 1.7.0 ---
2024-08-15T10:07:45.357533272Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Started.", "level": "INFO"}
2024-08-15T10:07:45.360670254Z INFO 2024-08-15 10:07:45,358 infinity_emb INFO: batch_handler.py:321
2024-08-15T10:07:45.360695094Z creating batching engine
2024-08-15T10:07:45.363612029Z INFO 2024-08-15 10:07:45,360 infinity_emb INFO: ready batch_handler.py:384
2024-08-15T10:07:45.363629230Z to batch requests.
2024-08-15T10:07:46.110630606Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/wsdp02o8uipf7h/job-done/4vmqpadg317i92/sync-9ba78228-2771-4781-a1b8-16aac1718974-e1?gpu=NVIDIA+RTX+4000+Ada+Generation&isStream=false'", "level": "ERROR"}
2024-08-15T10:07:46.110678748Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Finished.", "level": "INFO"}
2024-08-15T10:07:46.634466067Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Started.", "level": "INFO"}
2024-08-15T10:07:46.735420699Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/wsdp02o8uipf7h/job-done/4vmqpadg317i92/79345661-90dd-42ef-a3bc-5a2b82f70d66-e1?gpu=NVIDIA+RTX+4000+Ada+Generation&isStream=false'", "level": "ERROR"}
2024-08-15T10:07:46.735438659Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Finished.", "level": "INFO"}
I guess this also doesn't help with finding anything. What could be the next steps here to debug what is going on @pandyamarut? Or maybe you have another idea @michaelfeil?
Talked with @pandyamarut: We will try to debug what is going on here!
When using the worker with the image
runpod/worker-infinity-embedding:stable-cuda12.1.0
, with this env varMODEL_NAMES
:jinaai/jina-embeddings-v2-base-de
, we see this error:According to https://github.com/michaelfeil/infinity/issues/115#issuecomment-1967237474 we should be able to solve this by setting these env variables:
INFINITY_DISABLE_OPTIMUM
:TRUE
INFINITY_DISABLE_COMPILE
:TRUE
But this is not working, we still see an error:
Request
Output
So it looks like everything is completed, but there is no expected output (the embeddings).
OpenAI-compatible API
The behavior is the same when using the OpenAI-compatible API: It doesn't work, just provides the same output as above.