huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.45k stars 960 forks source link

The HF_TRANSFER is not working for the model CalderaAI/30B-Lazarus #461

Closed ArnaudHureaux closed 11 months ago

ArnaudHureaux commented 1 year ago

System Info

I'am on a Ubuntu server of https://console.paperspace.com/ with this 2 A100 GPU, but when i run the model CalderaAI/30B-Lazarus i don't cannot use the HF Transfer, even with the "--net=host" (solution working with other models, find on another issue)

Full description of my GPUs : paperspace@pse55xf0v:~/text-generation-inference$ nvidia-smi Thu Jun 15 10:00:25 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... Off | 00000000:00:05.0 Off | 0 | | N/A 29C P0 53W / 400W | 184MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A100-SXM... Off | 00000000:00:06.0 Off | 0 | | N/A 26C P0 53W / 400W | 4MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1497 G /usr/lib/xorg/Xorg 74MiB | | 0 N/A N/A 2389 G /usr/bin/gnome-shell 102MiB |

I run :

sudo docker run --net=host --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data -e HF_HUB_ENABLE_HF_TRANSFER=1 ghcr.io/huggingface/text-generation-inference:0.8 --model-id CalderaAI/30B-Lazarus --num-shard 1 --env --disable-custom-kernels

My error :

HF_TRANSFER for Lararus model

2023-06-15T10:56:36.180341Z ERROR download: text_generation_launcher: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

2023-06-15T10:56:36.180378Z  INFO download: text_generation_launcher: Retry 4/4 

2023-06-15T10:56:36.180458Z  INFO download: text_generation_launcher: Download file: pytorch_model-00003-of-00007.bin

2023-06-15T10:57:00.401367Z ERROR text_generation_launcher: Download encountered

an error: Traceback (most recent call last):

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 486, in http_get

    download(url, temp_file.name, max_files, chunk_size, headers=headers)       

Exception: Error while downloading: Os { code: 28, kind: StorageFull, message: "No space left on device" }

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>

    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 137, in download_weights

    local_pt_files = utils.download_weights(pt_filenames, model_id, revision)   

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 167, in download_weights

    file = download_file(filename)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 159, in download_file

    raise e

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 147, in download_file

    local_file = hf_hub_download(

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn

    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1347, in hf_hub_download

    http_get(

  File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 495, in http_get

    raise RuntimeError(

RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

Error: DownloadError

Information

Tasks

Reproduction

Run :

sudo docker run --net=host --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data -e HF_HUB_ENABLE_HF_TRANSFER=1 ghcr.io/huggingface/text-generation-inference:0.8 --model-id CalderaAI/30B-Lazarus --num-shard 1 --env --disable-custom-kernels

On a paperspace server on Ubuntu with 2 A100 GPUs

Expected behavior

To download the Lazarus model with the HF transfer

Narsil commented 1 year ago

Try disabling it ? It should still download the model just a bit slower.

hf_transfer is really barebones, and any flaky network might trigger issues for you (or because you're using much more resources sometimes other part of the infra, not necessarily yours might start to lag down causing flaky network overall). Using the raw python download is definitely more recommended for stable downloading.

ksingh-scogo commented 12 months ago

This works for me

docker run --shm-size 1g --net=host -p 8080:80 -v $PWD/data:/data -e HUGGING_FACE_HUB_TOKEN=$token -e HF_HUB_ENABLE_HF_TRANSFER=0 ghcr.io/huggingface/text-generation-inference:latest  --model-id TheBloke/Llama-2-13B-chat-GGML --quantize bitsandbytes
majidbhatti commented 12 months ago

how can i avoid this error. i am using aws sagemaker.

Narsil commented 11 months ago

Isn't there a way for you to provide environement variables ?

HF_HUB_ENABLE_HF_TRANSFER=0

Is what you are looking for.

anastasia-enot commented 11 months ago

Isn't there a way for you to provide environement variables ?

HF_HUB_ENABLE_HF_TRANSFER=0

Is what you are looking for.

Hello Narsil, I have a similar problem to majidbhatti. I am on a SageMaker notebook and I get the same error. I tried your solution by setting os.environ["HF_HUB_ENABLE_HF_TRANSFER"]="0" but it did not change anything, I still get the exact same error "RuntimeError: An error occurred while downloading using hf_transfer. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.". Do you have any suggestions?

majidbhatti commented 11 months ago

I avoided this error using hub = { 'HF_HUB_ENABLE_HF_TRANSFER': 0 }

huggingface_model = HuggingFaceModel( image_uri=get_huggingface_llm_image_uri("huggingface",version="0.8.2"), env=hub )

anastasia-enot commented 11 months ago

I avoided this error using hub = { 'HF_HUB_ENABLE_HF_TRANSFER': 0 }

huggingface_model = HuggingFaceModel( image_uri=get_huggingface_llm_image_uri("huggingface",version="0.8.2"), env=hub )

Thank you so much!! It worked for me.

Narsil commented 11 months ago

Thanks for sharing the solution !

Closing this then

chintanckg commented 11 months ago

You can also add HF_HUB_ENABLE_HF_TRANSFER=0 in the docker command,

docker run --shm-size 1g --env HF_HUB_ENABLE_HF_TRANSFER=0 .......