Open LeeSureman opened 6 days ago
I find this issue is due to the outlines cache issue, which is similar with this: https://github.com/vllm-project/vllm/pull/7831. Maybe you can follow this to address the issue in sglang, caused by outlines: https://github.com/vllm-project/vllm/pull/7831
Checklist
Describe the bug
when I use slurm to launch 32 or 192 jobs for offline batch inference, which simultaneously load sgl.engine. I met the following error although I set disable_disk_cache=True. If I only run one job for this, it will not meet this error.
The error is as follows:
Reproduction
Python: slurm_task.py
Sbatch Script:
Environment
2024-11-19 08:47:37.576574: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 Python: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0] CUDA available: False PyTorch: 2.4.0 sglang: 0.3.5 flashinfer: 0.1.6+cu124torch2.4 triton: 3.0.0 transformers: 4.46.2 requests: 2.32.3 tqdm: 4.67.0 numpy: 1.23.0 aiohttp: 3.10.5 fastapi: 0.115.4 hf_transfer: 0.1.8 huggingface_hub: 0.24.6 interegular: 0.3.3 packaging: 24.1 PIL: 10.4.0 psutil: 6.0.0 pydantic: 2.9.2 uvicorn: 0.32.0 uvloop: 0.21.0 zmq: 26.2.0 vllm: 0.6.3.post1 multipart: 0.0.17 openai: 1.54.4 anthropic: 0.39.0 Hypervisor vendor: KVM ulimit soft: 1024