opea-project / GenAIInfra

Containerization and cloud native suite for OPEA
Apache License 2.0
16 stars 22 forks source link

chatqna HUGGINGFACEHUB_API_TOKEN doesn't seem to be doing all of its job #100

Closed igordcard closed 6 days ago

igordcard commented 2 weeks ago

I see the following in chatqna-tgi's logs (in K8s), installed via https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/chatqna even after passing my correct hugging face token and accepting the conditions to accessing the gated repo for llama3:

export HFTOKEN="..."
export MODELDIR="/mnt"
export MODELNAME="meta-llama/Meta-Llama-3-8B"

helm install chatqna chatqna --set llm-uservice.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set llm-uservice.tgi.volume=${MODELDIR} --set llm-uservice.tgi.LLM_MODEL_ID=${MODELNAME}

kubectl logs chatqna-tgi-f455cdb9b-bqwfv:

{
    "message": "Download encountered an error: \nTraceback (most recent call last):\n\n  File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py\", line 270, in hf_raise_for_status\n    response.raise_for_status()\n\n  File \"/opt/conda/lib/python3.10/site-packages/requests/models.py\", line 1021, in raise_for_status\n    raise HTTPError(http_error_msg, response=self)\n\nrequests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/meta-llama/Meta-Llama-3-8B\n\n\nThe above exception was the direct cause of the following exception:\n\n\nTraceback (most recent call last):\n\n  File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n    sys.exit(app())\n\n  File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 128, in download_weights\n    utils.weight_files(model_id, revision, extension)\n\n  File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/hub.py\", line 151, in weight_files\n    filenames = weight_hub_files(model_id, revision, extension)\n\n  File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/hub.py\", line 110, in weight_hub_files\n    info = api.model_info(model_id, revision=revision)\n\n  File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 118, in _inner_fn\n    return fn(*args, **kwargs)\n\n  File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/hf_api.py\", line 1922, in model_info\n    hf_raise_for_status(r)\n\n  File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py\", line 286, in hf_raise_for_status\n    raise GatedRepoError(message, response) from e\n\nhuggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-666c9efe-652c658f64be7e2644868213;c60b443a-ef5b-4dde-8165-60966c452697)\n\nCannot access gated repo for url https://huggingface.co/api/models/meta-llama/Meta-Llama-3-8B.\nAccess to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.\n"
  }
yongfengdu commented 1 week ago

MODELNAME="meta-llama/Meta-Llama-3-8B" meta-llama need special permission with HUGGINGFACE TOKEN, previously we didn't pass the TOKEN to TGI inference server and you can use models like intel/xxx. The Reorg of helm charts added support for passing HF_TOKEN to tgi server, you can try after that merged (Pay attention to the readme file change with helm dependency update) Before that, you can try with other models like Intel/xxx instead of the meta-llama

igordcard commented 6 days ago

Thanks Dolpher, your fix is working for me.