Unable to deploy TinyLlama in Amazon SageMaker using Optimum Neuron 0.0.20 w/ Neuronx 2.*

huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.

Apache License 2.0

196 stars 59 forks source link

Unable to deploy TinyLlama in Amazon SageMaker using Optimum Neuron 0.0.20 w/ Neuronx 2.* #519

Closed ari-vedant-jain closed 2 weeks ago

ari-vedant-jain commented 6 months ago

System Info

optimum-neuron 0.0.20
neuronx-cc 2.*
python 3.10

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Inference-TinyLlama-1.1B.ipynb.txt

Expected behavior

Running the following cell will result in failure in deployment (error attached): log-events-viewer-result (2).csv

model = Model(image_uri=image_uri, model_data=code_artifact, role=role, sagemaker_session = sess)

model._is_compiled_model = True

model.deploy(initial_instance_count=1, instance_type=instance_type, container_startup_health_check_timeout=500, volume_size=200, endpoint_name=endpoint_name)

dacorvo commented 6 months ago

You need to specify at least a top_k or top_p when sampling.

If the error happens before that, try increasing the deployment time-out and volume_size (although I think tiny-llama would fit).

dacorvo commented 6 months ago

@ari-vedant-jain did you try my suggestion ?