Open pinak-p opened 1 month ago
@pinak-p I reproduce your issue, both on SageMaker and locally with a 0.0.24 image.
I verified that deploying the model with neuronx-tgi 0.0.23 leads to meaningful results, so this seems to be only that version. I also verified that I had no issue:
@pinak-p this is not only a TGI issue: I also get gibberish with optimum-neuron
itself, which makes me think that this is actually the same issue as the one you reported in transformers-neuronx
: https://github.com/aws-neuron/transformers-neuronx/issues/94.
Can you verify that the issue also happens with a vanilla transformers-neuronx
model using continuous batching ?
@pinak-p could you check with version 0.0.25
?
What's the URL for 0.0.25 ? I don't see it here https://github.com/aws/deep-learning-containers/blob/master/available_images.md ... nor does the sagemaker SDK have the version.
@pinak-p it is still being deployed, but you can use the neuronx-tgi docker image on an ec2 instance. https://github.com/huggingface/optimum-neuron/pkgs/container/neuronx-tgi. Alternatively, you can use directly optimum-neuron
and create a pipeline (see the documentation).
System Info
Who can help?
@dacorvo
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
I'm using the below configuration to deploy the model on SageMaker.
Text Generation:
Output:
Expected behavior
Expectation is to get some text that is not weird and makes some sense.