tensorchord / modelz-llm

OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)
https://modelz.ai
Apache License 2.0
265 stars 26 forks source link

bug: Unexpected OOM in ChatGLM 6B #69

Closed gaocegege closed 1 year ago

gaocegege commented 1 year ago

us-central1-docker.pkg.dev/nth-guide-378813/modelzai/llm-chatglm-6b:23.06.9

The server will return 137 exit code after a request to the server. But the memory and the GPU memory usage is low (12/24GB GPU, 3GB/32GB memory)

gaocegege commented 1 year ago
INFO:     10.4.17.1:56470 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:56486 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:56488 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:56508 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:56518 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:56520 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:56522 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:56536 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:56534 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53518 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53528 - "GET / HTTP/1.1" 200 OK
2023-06-06 10:13:46,824 - 1 - WARNING - logging.py:295 - The dtype of attention mask (torch.int64) is not bool
INFO:     10.4.2.24:42932 - "POST /chat/completions HTTP/1.1" 200 OK
INFO:     Shutting down
INFO:     10.4.17.1:53544 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53556 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53562 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53568 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53574 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53580 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53594 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53606 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53608 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53624 - "GET / HTTP/1.1" 200 OK
INFO:     10.4.17.1:53630 - "GET / HTTP/1.1" 200 OK
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]

This is the log

gaocegege commented 1 year ago

This is caused by the uvicorn. The inference will block the ping endpoint.