Closed SimKennedy closed 3 weeks ago
I see this too with 0.23.0. Copying and pasting the example in https://huggingface.co/blog/tgi-messages-api doesn't work.
Bad request:
400: Instance compute 'Gpu' - 'p4de' - '2xlarge' in 'aws' - 'us-east-1' not found
I tried GCP too, same result. I used the parameters for the curl call in the IE page.
curl https://api.endpoints.huggingface.cloud/v2/endpoint/juliensimon \
-X POST \
-d '{"compute":{"accelerator":"gpu","instanceSize":"x4","instanceType":"nvidia-l4","scaling":{"maxReplica":1,"minReplica":1}},"model":{"framework":"pytorch","image":{"custom":{"health_route":"/health","env":{"MAX_BATCH_PREFILL_TOKENS":"2048","MAX_INPUT_LENGTH":"1024","MAX_TOTAL_TOKENS":"1512","MODEL_ID":"/repository"},"url":"ghcr.io/huggingface/text-generation-inference:2.0.2"}},"repository":"meta-llama/Meta-Llama-3-8B-Instruct","task":"text-generation"},"name":"meta-llama-3-8b-instruct-plc","provider":{"region":"us-east4","vendor":"gcp"},"type":"protected"}' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer XXXXX"
endpoint = create_inference_endpoint(
name="llama3-8b-julien-demo",
repository="meta-llama/Meta-Llama-3-8B-Instruct",
framework="pytorch",
task="text-generation",
accelerator="gpu",
vendor="google",
region="us-east4",
type="protected",
instance_type="nvidia-l4",
instance_size="x4",
custom_image={
"health_route": "/health",
"env": {
"MAX_INPUT_LENGTH": "1024",
"MAX_BATCH_PREFILL_TOKENS": "2048",
"MAX_TOTAL_TOKENS": "1512",
"MODEL_ID": "/repository",
},
"url": "ghcr.io/huggingface/text-generation-inference:2.0.2", # use this build or newer
},
)
Bad request:
400: Instance compute 'Gpu' - 'l4' - 'x4' in 'google' - 'us-east4' not found
@philschmid any idea?
The naming was adjust. pinging @co42 here.
The naming here should be correct: https://huggingface.co/docs/inference-endpoints/pricing
can you try intel-icl
and x4
?
The Google example works. My mistake was the vendor name: "gcp", not "google" :)
The blog post example works when changed to:
instance_type="nvidia-a100",
instance_size="x2",
Thanks everyone for reporting/fixing this! Just to be sure, is there still something to fix on huggingface_hub
's doc side or all good now?
Closing it now :) Just let me know if something remains unclear.
Describe the bug
Using hf_api.create_inference_endpoint with configuration in documentation raises error.
https://huggingface.co/docs/huggingface_hub/en/package_reference/hf_api#huggingface_hub.HfApi.create_inference_endpoint.task
Instance types appear different in the available vendors too: https://api.endpoints.huggingface.cloud/v2/provider https://huggingface.co/docs/inference-endpoints/en/pricing
Terminology in vendor list does not match the API:
Eg.
instanceSize
"small" != "x1"Reproduction
Logs
System info