I'm running inf2 neuron TGI on Sagemaker with optimum-neuron=0.0.25.
I'm using the SPECULATE=2 option but I get the following message in the logs:
Error: No such option: --speculate
Here's my sagemaker model environment.
{
"SM_MODEL_DIR" = "/opt/ml/model"
"HF_MODEL_ID" = "/opt/ml/model"
"HF_NUM_CORES" = "24"
"HF_BATCH_SIZE" = "4"
"HF_SEQUENCE_LENGTH" = "3072"
"HF_AUTO_CAST_TYPE" = "bf16"
"MAX_BATCH_SIZE" = "4"
"MAX_INPUT_TOKENS" = "2000"
"MAX_TOTAL_TOKENS" = "3072"
"MESSAGES_API_ENABLED" = "false"
"MAX_BATCH_PREFILL_TOKENS" = "3122"
"SPECULATE" = 2
}
Who can help?
@dacorvo
Information
[ ] The official example scripts
[X] My own modified scripts
Tasks
[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
Using a fine tuned Llama 3.1 70B. Haven't tried it yet on a public Llama 3.1 70B version but I don't expect it to be a model issue.
System Info
Who can help?
@dacorvo
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
Using a fine tuned Llama 3.1 70B. Haven't tried it yet on a public Llama 3.1 70B version but I don't expect it to be a model issue.
Expected behavior
I would expect the SPECULATE option to work