Can the TTT trained LLM support realtime CPU inference?

test-time-training / ttt-lm-pytorch

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

MIT License

1.01k stars 56 forks source link

Can the TTT trained LLM support realtime CPU inference? #13

Closed JohnHerry closed 3 months ago

JohnHerry commented 3 months ago

Thanks for the good job!
Is this work have the prospect to support LLM based real-time communication applications runing on CPU devices ?

karan-dalal commented 3 months ago

Hi, thank you for your interest in our work! I'm not sure I entirely understand your question. Somewhat related - we've recently released our inference kernels, written in CUDA and Triton.

JohnHerry commented 3 months ago

Hi, thank you for your interest in our work! I'm not sure I entirely understand your question. Somewhat related - we've recently released our inference kernels, written in CUDA and Triton.

Good news！ Thank you very much. Better if the job lower the CUDA , gcc, pytorch version dependency. We are too tired to follw the env update! So, if it is not necessary, lower is better!