`HUGGING_FACE_HUB_TOKEN` not exported in Sagemaker entrypoint

mspronesti commented 1 year ago

System Info

AWS sagemaker 2.163.0
g5.12xlarge instance type with 4 NVIDIA A10G GPUs and 96GB of GPU memory

Information

[X] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

import json
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.8.2"
)

# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300
hf_api_token = 'hf_...'

# TGI config
config = {
  'HF_MODEL_ID': "<USER>/<PRIVATE_MODEL>", # model_id from hf.co/models
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(1024),  # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(2048),  # Max length of the generation (including input text)
  'HUGGING_FACE_HUB_TOKEN': json.dumps(hf_api_token)
}

# create HuggingFaceModel
llm_model = HuggingFaceModel(
  role=role,
  image_uri=llm_image,
  env=config
)

Expected behavior

To successfully serve a private model hosted on huggingface hub passing a HUGGING_FACE_HUB_TOKEN

philschmid commented 1 year ago

You can try uploading it to S3 and then deploying it following this blog post: https://www.philschmid.de/sagemaker-llm-vpc

mspronesti commented 1 year ago

@philschmid this is actually very helpful, thank you! However, why don't you also export HUGGING_FACE_HUB_TOKEN here so that one can also use serve a private model on the hub ?

cirocavani commented 1 year ago

Hi @mspronesti

I am guessing, it seems that the launcher get the access token from a environment variable. Have you tried HF_API_TOKEN ?

https://github.com/huggingface/text-generation-inference/blob/v0.8.2/launcher/src/main.rs#L583-L586

# TGI config
config = {
  'HF_MODEL_ID': "<USER>/<PRIVATE_MODEL>", # model_id from hf.co/models
  # ...
  'HF_API_TOKEN': json.dumps(hf_api_token)
}

philschmid commented 1 year ago

@cirocavani suggestion should also work! The reason why we created the VPC+S3 blog is to show how to do it when your sagemaker environment is not having internet access.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

huggingface / text-generation-inference