huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
206 stars 60 forks source link

SPECULATE option error #722

Open SteliosGian opened 1 day ago

SteliosGian commented 1 day ago

System Info

I'm running inf2 neuron TGI on Sagemaker with optimum-neuron=0.0.25.

I'm using the SPECULATE=2 option but I get the following message in the logs:

Error: No such option: --speculate

Here's my sagemaker model environment.

{
    "SM_MODEL_DIR" = "/opt/ml/model"
    "HF_MODEL_ID" = "/opt/ml/model"
    "HF_NUM_CORES" = "24"
    "HF_BATCH_SIZE" = "4"
    "HF_SEQUENCE_LENGTH" = "3072"
    "HF_AUTO_CAST_TYPE" = "bf16"
    "MAX_BATCH_SIZE" = "4"
    "MAX_INPUT_TOKENS" = "2000"
    "MAX_TOTAL_TOKENS" = "3072"
    "MESSAGES_API_ENABLED" = "false"
    "MAX_BATCH_PREFILL_TOKENS" = "3122"
    "SPECULATE" = 2
  }

Who can help?

@dacorvo

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Using a fine tuned Llama 3.1 70B. Haven't tried it yet on a public Llama 3.1 70B version but I don't expect it to be a model issue.

Expected behavior

I would expect the SPECULATE option to work

dacorvo commented 1 day ago

@SteliosGian thank you for your feedback. This is not supported yet for NeuronX TGI.

SteliosGian commented 1 day ago

Thank you @dacorvo . Is there a list of supported arguments for NeuronX TGI?