Nulls instead of vector for Alibaba-NLP/gte-multilingual-base on T4 GPU

System Info

Model - Alibaba-NLP/gte-multilingual-base Image - text-embeddings-inference:turing-1.5 Azure VM - Standard_NC4as_T4_v3 GPU - Nvidia Tesla T4 AKS version - 1.28.14 OS - Ubuntu 22.04 Command -

command: ["text-embeddings-router"]  
args: 
[ 
      "--model-id", "Alibaba-NLP/gte-multilingual-base",
      "--port", "8080",
      "--max-client-batch-size", "2000",
      "--payload-limit", "200000000",
      "--max-batch-tokens", "260000",
      "--revision", "refs/pr/7",
      "--auto-truncate"
]

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

When executing the following request the first time:

POST /v1/embeddings
{
 "input":  "test",
 "model": "Alibaba-NLP/gte-multilingual-base"
}

The response is following

{
    "object": "list",
    "data": [
        {
            "object": "embedding",
            "embedding": [
                -0.055719655,
                0.06356562,
                -0.030253513
                ......................
            ],
            "index": 0
        }
    ],
    "model": "Alibaba-NLP/gte-multilingual-base",
    "usage": {
        "prompt_tokens": 3,
        "total_tokens": 3
    }
}

However, when repeating the same request the second time, I am getting:

{
    "object": "list",
    "data": [
        {
            "object": "embedding",
            "embedding": [
                null,
                null,
                null
                ......................            
             ],
            "index": 0
        }
    ],
    "model": "Alibaba-NLP/gte-multilingual-base",
    "usage": {
        "prompt_tokens": 3,
        "total_tokens": 3
    }
}

I tried setting USE_FLASH_ATTENTION=False, however, it seems that this env variable is ignored for GTE models. I understand that Turing support is marked as experimental, but is there any way to run this on T4 with or without Flash Attention v1?

Expected behavior

Do not get nulls instead of vector.

huggingface / text-embeddings-inference