Open drrros opened 1 month ago
This is full logs in API mode:
COMPLETION INPUT:----
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.<|eot_id|><|start_header_id|>user<|end_header_id|>
{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{assistant_response}<|eot_id|><|start_header_id|>user<|end_header_id|>
{next_user_prompt}<|eot_id|>
***
DrRos: text
DrRos: test
Assistant:
----
INFO: 192.168.0.77:58360 - "POST /v1/completions HTTP/1.1" 200 OK
2024-10-02 19:29:03,525 DEBUG /home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/transformers.py[240]: input_ids: torch.Size([1, 202])
2024-10-02 19:29:03,529 DEBUG /home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/transformers.py[262]: cache position: 0 to 202
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/responses.py", line 257, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/responses.py", line 253, in wrap
await func()
File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 534, in receive
await self.message_event.wait()
File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/asyncio/locks.py", line 213, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f0f919ca950
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
| return await self.app(scope, receive, send)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
| await super().__call__(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
| raise exc
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
| await self.app(scope, receive, _send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
| await self.app(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/responses.py", line 250, in __call__
| async with anyio.create_task_group() as task_group:
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 736, in __aexit__
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/responses.py", line 253, in wrap
| await func()
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/starlette/responses.py", line 242, in stream_response
| async for chunk in self.body_iterator:
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/schemas/assistants/streaming.py", line 80, in check_client_link
| async for event in async_events:
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/schemas/assistants/streaming.py", line 93, in to_stream_reply
| async for event in async_events:
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/schemas/assistants/streaming.py", line 87, in add_done
| async for event in async_events:
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/api/openai/legacy/completions.py", line 23, in inner
| async for token in interface.inference(create.prompt,id):
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 323, in inference
| for t in self.prefill(input_ids,self.check_is_new(thread_id)):
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
| response = gen.send(None)
| ^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 272, in prefill
| logits = self.model(
| ^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/models/modeling_deepseek.py", line 1731, in forward
| outputs = self.model(
| ^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/models.py", line 719, in forward
| layer_outputs = decoder_layer(
| ^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/models/modeling_deepseek.py", line 1238, in forward
| hidden_states, self_attn_weights, present_key_value = self.self_attn(
| ^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/attention.py", line 170, in forward
| return self.forward_chunck(
| ^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/attention.py", line 71, in forward_chunck
| q = self.q_b_proj(self.q_a_layernorm(self.q_a_proj(hidden_states)))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/drros/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/models/modeling_deepseek.py", line 113, in forward
| hidden_states = hidden_states.to(torch.float32)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| RuntimeError: CUDA error: invalid device function
| CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
| For debugging consider passing CUDA_LAUNCH_BLOCKING=1
| Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
|
+------------------------------------
versions:
(base) drros@tesla-ubuntu:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
(base) drros@tesla-ubuntu:~$ uname -a
Linux tesla-ubuntu 5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
I have the same problem, did you have solved it?
I have the same problem, did you have solved it?
No. Tried different versions of torch, but it did not solve the issue. What GPUs are you using? Mine are P40's.
I have the same problem, did you have solved it?
No. Tried different versions of torch, but it did not solve the issue. What GPUs are you using? Mine are P40's.
My gpu is V100, maybe I find the reason, the V100 device use vlota architecture,this problem lead to I can't use flash-atten normally,so most of this project is incompatible.
I have the same problem, did you have solved it?
No. Tried different versions of torch, but it did not solve the issue. What GPUs are you using? Mine are P40's.
Sorry, we use the Marlin operator to calculate the layer on the GPU. It requires Compute Capability 8.0 or above to run. However, the Compute Capability of P40 is 6.1, so it cannot run. Maybe you can run it by removing the Marlin operator. @Azure-Tang Maybe you can show how to remove Marlin?
I'm trying to run a DeepSeek-V2.5 model. Command used:
python -m ktransformers.local_chat --model_path ./DeepSeek-V2.5/ --gguf_path ../
I've tried with both ktransformers.local_chat and web interface mode, with and without --optimize_config_path option. Model loads but when first iteraction occurs it fails with this traceback. Other backends (koboldcpp and llama.cpp) runs fine. Server specs: 200Gb ram, dual P40.