modal-labs / modal-examples

Examples of programs built using Modal
https://modal.com/docs
MIT License
729 stars 175 forks source link

Mixtral tutorial doesn't work without huggingface access token #703

Closed justinliang1020 closed 7 months ago

justinliang1020 commented 7 months ago

The tutorial for running mixtral on VLLM doesn't work since the model cannot be downloaded without a huggingface access token. This is because mixtral is now a gated model: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

Image build for im-1P0Aou6cl9H3BAwictbALw failed with the exception:                                           │
│ GatedRepoError('401 Client Error. (Request ID:                                                                 │
│ Root=1-66213747-475d6ad5261bb9eb4931c4fd;025f8bf1-0bb2-42ac-86a0-743e752004a0)\n\nCannot access gated repo for │
│ url https://huggingface.co/api/models/mistralai/Mixtral-8x7B-Instruct-v0.1/revision/main.\nRepo model          │
│ mistralai/Mixtral-8x7B-Instruct-v0.1 is gated. You must be authenticated to access it.')

Affected Tutorial: https://modal.com/docs/examples/vllm_mixtral Affected Code: https://github.com/modal-labs/modal-examples/blob/main/06_gpu_and_ml/llm-serving/vllm_mixtral.py

This can be fixed using a similar approach as done here (adding an env var for a HF_TOKEN) to the function call where the model is downloaded. Also the tutorial needs to be updated to inform the user that a huggingface access token is required.

charlesfrye commented 7 months ago

Thanks for the thorough report! We just picked up this failure in our monitoring system, and it's great to have this issue to confirm it and suggest the fix.