jinaai/jina-embeddings-v2-base-* not working

TimPietrusky commented 1 month ago

When using the worker with the image runpod/worker-infinity-embedding:stable-cuda12.1.0, with this env var MODEL_NAMES: jinaai/jina-embeddings-v2-base-de, we see this error:

The transformation of the model "JinaBertModel" to BetterTransformer failed

According to https://github.com/michaelfeil/infinity/issues/115#issuecomment-1967237474 we should be able to solve this by setting these env variables:

INFINITY_DISABLE_OPTIMUM: TRUE
INFINITY_DISABLE_COMPILE: TRUE

But this is not working, we still see an error:

2024-08-05T13:35:35.781469998Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Started.", "level": "INFO"}
2024-08-05T13:35:36.794152731Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/4nv0a16wv8ef1p/job-done/1az42tjeq0sk40/6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1?gpu=NVIDIA+RTX+A4500&isStream=false')", "level": "ERROR"}
2024-08-05T13:35:36.794207183Z {"requestId": "6a91cfa6-db93-4c04-83c2-3ae5e3a8a52f-e1", "message": "Finished.", "level": "INFO"}

Request

{
  "input": {
    "model": "jina-embeddings-v2-base-de",
    "input": "Hello World"
  }
}

Output

{
  "delayTime": 1125,
  "executionTime": 1049,
  "id": "8a3de0e1-6b43-41e4-a4af-5ad1473463a1-e1",
  "status": "COMPLETED"
}

So it looks like everything is completed, but there is no expected output (the embeddings).

OpenAI-compatible API

The behavior is the same when using the OpenAI-compatible API: It doesn't work, just provides the same output as above.

TimPietrusky commented 1 month ago

@michaelfeil do you maybe have another idea on how to get this sorted?

michaelfeil commented 1 month ago

@TimPietrusky The output you posted are not really descriptive for the problem that occures.

The environment variables that are currently usable are not up to date. Here are all the functions that generate env variables in infinity. They are however just generating the defaults, I think there is nothing to do here.

https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/env.py

Runpod-infinity currently uses: infinity-emb[all,onnxruntime-gpu]==0.0.35 (See: https://github.com/runpod-workers/worker-infinity-embedding/blob/main/builder/requirements.txt) Does you model run with this version?

TimPietrusky commented 1 month ago

@michaelfeil thanks for your quick response.

Sorry for the unusable error messages, this is what we get in our UI. I hopefully get access to more in depth logging or find someone how can actually help out here.

infinity-emb[all,onnxruntime-gpu]==0.0.35

Which version do you recommend? Should we try 0.0.53?

michaelfeil commented 1 month ago

Yeah, maybe 0.0.53 fixes this? Can you try running infinity from this and 0.0.35 & see if it works? Then try and run the image from runpod locally. if both work, check the UI for additional messages, not the other way around. :)

TimPietrusky commented 1 month ago

@michaelfeil awesome, thank you! Will do.

TimPietrusky commented 1 month ago

@pandyamarut please let us know when you had time to update this 🙏 Then I can do the testing.

TimPietrusky commented 1 month ago

Thanks to @pandyamarut we have updated the version in the worker runpod/worker-infinity-text-embedding:0.0.1-cuda12.1.0, but it will still not produce the desired outcome when using the same request / env vars as before:

2024-08-15T10:07:37.654523432Z INFO     2024-08-15 10:07:37,650 datasets INFO: PyTorch version     config.py:59
2024-08-15T10:07:37.654545312Z          2.5.0.dev20240618+cu121 available.                                     
2024-08-15T10:07:38.125792094Z INFO     2024-08-15 10:07:38,124 infinity_emb INFO:           select_model.py:57
2024-08-15T10:07:38.125805034Z          model=`jinaai/jina-embeddings-v2-base-de` selected,                    
2024-08-15T10:07:38.125806844Z          using engine=`torch` and device=`None`                                 
2024-08-15T10:07:38.384517276Z INFO     2024-08-15 10:07:38,381                      SentenceTransformer.py:189
2024-08-15T10:07:38.384544466Z          sentence_transformers.SentenceTransformer                              
2024-08-15T10:07:38.384548407Z          INFO: Use pytorch device_name: cuda                                    
2024-08-15T10:07:38.387090089Z INFO     2024-08-15 10:07:38,384                      SentenceTransformer.py:197
2024-08-15T10:07:38.387112810Z          sentence_transformers.SentenceTransformer                              
2024-08-15T10:07:38.387116550Z          INFO: Load pretrained SentenceTransformer:                             
2024-08-15T10:07:38.387119610Z          jinaai/jina-embeddings-v2-base-de                                      
2024-08-15T10:07:44.040260125Z WARNING  2024-08-15 10:07:44,038 infinity_emb WARNING:        acceleration.py:35
2024-08-15T10:07:44.040299926Z          DEPRECATED `INFINITY_DISABLE_OPTIMUM` - setting                        
2024-08-15T10:07:44.040303436Z          optimizations via                                                      
2024-08-15T10:07:44.040305626Z          BetterTransformer,INFINITY_DISABLE_OPTIMUM is no                       
2024-08-15T10:07:44.040307746Z          longer supported, please use the CLI / ENV for that.                   
2024-08-15T10:07:44.041460234Z INFO     2024-08-15 10:07:44,040 infinity_emb INFO:   sentence_transformer.py:81
2024-08-15T10:07:44.041485354Z          Switching to half() precision (cuda: fp16).                            
2024-08-15T10:07:44.059055655Z --- Starting Serverless Worker |  Version 1.7.0 ---
2024-08-15T10:07:45.357533272Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Started.", "level": "INFO"}
2024-08-15T10:07:45.360670254Z INFO     2024-08-15 10:07:45,358 infinity_emb INFO:         batch_handler.py:321
2024-08-15T10:07:45.360695094Z          creating batching engine                                               
2024-08-15T10:07:45.363612029Z INFO     2024-08-15 10:07:45,360 infinity_emb INFO: ready   batch_handler.py:384
2024-08-15T10:07:45.363629230Z          to batch requests.                                                     
2024-08-15T10:07:46.110630606Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/wsdp02o8uipf7h/job-done/4vmqpadg317i92/sync-9ba78228-2771-4781-a1b8-16aac1718974-e1?gpu=NVIDIA+RTX+4000+Ada+Generation&isStream=false'", "level": "ERROR"}
2024-08-15T10:07:46.110678748Z {"requestId": "sync-9ba78228-2771-4781-a1b8-16aac1718974-e1", "message": "Finished.", "level": "INFO"}
2024-08-15T10:07:46.634466067Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Started.", "level": "INFO"}
2024-08-15T10:07:46.735420699Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/wsdp02o8uipf7h/job-done/4vmqpadg317i92/79345661-90dd-42ef-a3bc-5a2b82f70d66-e1?gpu=NVIDIA+RTX+4000+Ada+Generation&isStream=false'", "level": "ERROR"}
2024-08-15T10:07:46.735438659Z {"requestId": "79345661-90dd-42ef-a3bc-5a2b82f70d66-e1", "message": "Finished.", "level": "INFO"}

I guess this also doesn't help with finding anything. What could be the next steps here to debug what is going on @pandyamarut? Or maybe you have another idea @michaelfeil?

TimPietrusky commented 1 month ago

Talked with @pandyamarut: We will try to debug what is going on here!

runpod-workers / worker-infinity-embedding