Closed JohnHerry closed 3 months ago
Hi, thank you for your interest in our work! I'm not sure I entirely understand your question. Somewhat related - we've recently released our inference kernels, written in CUDA and Triton.
Hi, thank you for your interest in our work! I'm not sure I entirely understand your question. Somewhat related - we've recently released our inference kernels, written in CUDA and Triton.
Good news! Thank you very much. Better if the job lower the CUDA , gcc, pytorch version dependency. We are too tired to follw the env update! So, if it is not necessary, lower is better!
Thanks for the good job!
Is this work have the prospect to support LLM based real-time communication applications runing on CPU devices ?