Closed mspronesti closed 1 month ago
You can try uploading it to S3 and then deploying it following this blog post: https://www.philschmid.de/sagemaker-llm-vpc
@philschmid this is actually very helpful, thank you! However, why don't you also export HUGGING_FACE_HUB_TOKEN
here so that one can also use serve a private model on the hub ?
Hi @mspronesti
I am guessing, it seems that the launcher get the access token from a environment variable. Have you tried HF_API_TOKEN
?
https://github.com/huggingface/text-generation-inference/blob/v0.8.2/launcher/src/main.rs#L583-L586
# TGI config
config = {
'HF_MODEL_ID': "<USER>/<PRIVATE_MODEL>", # model_id from hf.co/models
# ...
'HF_API_TOKEN': json.dumps(hf_api_token)
}
@cirocavani suggestion should also work! The reason why we created the VPC+S3 blog is to show how to do it when your sagemaker environment is not having internet access.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
System Info
sagemaker
2.163.0Information
Tasks
Reproduction
Expected behavior
To successfully serve a private model hosted on huggingface hub passing a
HUGGING_FACE_HUB_TOKEN