Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Describe the bug
运行20G以上模型,机器就会自动重启
To Reproduce
To help us to reproduce this bug, please provide information below:
Server: Containers: 4 Running: 2 Paused: 0 Stopped: 2 Images: 4 Server Version: 25.0.5 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: inactive Runtimes: runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: 7c3aca7a610df76212171d200ca3811ff6096eb8 runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin Kernel Version: 3.10.0-1160.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 144 Total Memory: 755.1GiB
Expected behavior
小模型都正常加载使用,如qwen2,完全没问题 1、20G以上模型,比如glm4,指定双卡,9997网页问答,胡言乱语几次,然后重启 2、指定单卡,正确返回几次,然后重启 3、docker安装ollama,正常使用,因此觉得不是机器、docker的问题
Additional context
Add any other context about the problem here.