huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.9k stars 1.05k forks source link

Errors Encountered in Execution Log: Issues with LD_PRELOAD and Missing Shared Library #2618

Open paulcx opened 3 days ago

paulcx commented 3 days ago

System Info

08T07:24:45.800673489Z ERROR: ld.so: object '/opt/conda/lib/python3.10/site-packages/nvidia/nccl/lib/libnccl.so.2' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

08T07:24:45.801550536Z text-generation-launcher: error while loading shared libraries: libpython3.11.so.1.0: cannot open shared object file: No such file or directory

Information

Tasks

Reproduction

docker version: sha-f6e2f05

Expected behavior

how to fix it?

danieldk commented 3 days ago

How are you starting Docker? f6e2f05 is the 2.3.1 release and I don't the error when running the container using e.g.:

model=HuggingFaceH4/zephyr-7b-beta
# share a volume with the Docker container to avoid downloading weights every run
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:2.3.1 --model-id $model
paulcx commented 2 days ago

How are you starting Docker? f6e2f05 is the 2.3.1 release and I don't the error when running the container using e.g.:

model=HuggingFaceH4/zephyr-7b-beta
# share a volume with the Docker container to avoid downloading weights every run
volume=$PWD/data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:2.3.1 --model-id $model

It's the same and I don't think there's an issue with the launch command since there's no problem in the previous version.