Open anret opened 1 month ago
I have the same problem
This tends to be a symptom of things going OOM when you make the request. To isolate this, can you disable safety first and only run inference and see if things successfully run?
Also, note that agentic-system
and toolchain
interfaces / contracts have now changed. Please take a look at the updates (specifically using the llama distribution start
CLI command, etc.)
Good day everyone, I am trying to run llama agentic system on RTX4090 with FP8 Quantization for the inference model and meta-llama/Llama-Guard-3-8B-INT8 for the Guard. WIth sufficiently small max_seq_len everything fits into 24GB VRAM and I can start inference server, and chat app. However as soon I send message in the chat I get the following error: "Error: Failed to initialize the TMA descriptor 801".
I will appreciate any help and sugggestion. Thank you in advance.