Merged Model from Huggingface runs fine with fastchat CLI but not when using service worker

I am running Fastchat on kubernetes. I have a worker for the controller, the fastchat api and a (gpu) worker for each of the models. When I pull this model from huggingface (downloaded using huggingface-cli) https://huggingface.co/Rmote6603/MedPrescription-FineTuning, I run the fastchat CLI command and type in my prompt, it works perfectly fine as expected: python3.9 -m fastchat.serve.cli --model-path MedPrescription-FineTuning

However, when I use the fastchat.serve.model_worker, it does not work at all when I try to use chat completion API, it gives me an error, even though v1/models API works as shown in the photo below: python3.9 -m fastchat.serve.model_worker --model-path MedPrescription-FineTuning --worker-address http://localhost:21002 --port 21002

When I run this POST request, curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization:Bearer API-TOKEN" -d '{ "model": "MedPrescription-FineTuning", "messages": [{"role": "user", "content": "Hello! What is your name?"}] }' It first times out:

Then it subsequently gives me Network Error:

{"object":"error","message":"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\n\n(probability tensor contains eitherinf,nanor element < 0)","code":50001}

I was wondering if anyone else has ran into this issue before. Does it have anything to do with Huggingface, models weights or something with FastChat limitations. I have only having issues with this merged mistral model.

lm-sys / FastChat

Merged Model from Huggingface runs fine with fastchat CLI but not when using service worker #3315