Open mariokostelac opened 1 month ago
I can confirm that meta-llama/Meta-Llama-3-70B-Instruct fails the same way.
This issue is fixed with version0.0.22
@dacorvo trying it out with 0.0.22 š
The corresponding pull-request: #580 . The sagemaker python package might not have been updated yet to support 0.0.22
(it was due later today).
Update: It is actually available (great !). FYI the image_uri should be something like: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.22-neuronx-py310-ubuntu22.04
Yes, figured it's available, but it's still creating the endpoint š .
Thanks a lot @dacorvo, I can confirm that it worker for me by just changing the version to 0.0.22 in the snippet above? Do you know who'd be responsible for fixing that on HF UI?
@mariokostelac thank you for the feedback. I'll take care of it. We were actually waiting for the sagemaker
update, and I had not realized it was ready.
The update was done this morning, but it has not been refreshed yet. It should be fixed soon.
Thanks a lot for the quick support on this issue. I'm running now with the original model (nvidia one) to verify that it works there too. Given that tokenizer configs are the same, I'd be very surprised if it didn't.
Feel free to report any issues you get: feedback on such new features/models is very valuable.
System Info
I saw many warnings like:
It failed to start with following error:
Is there some prep needed to be done to run the model on inferentia with this library?
Who can help?
@JingyaHuang @daco
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
Available above.
Expected behavior
Endpoint should start.