This project seriously rocks. Thank you very much.
I am not understanding the training mathematics for RWKV. And I want to run training, from scratch and update, off of a legacy C++ system. How easy would it be to slap together a baby RWKV training demo, even using char tokens or words from tiny-shakespeare similar to nanogpt, written in the same technology that you're using now for inference? Even headless would be fine.
I believe this would be a big help to many people.
This project seriously rocks. Thank you very much.
I am not understanding the training mathematics for RWKV. And I want to run training, from scratch and update, off of a legacy C++ system. How easy would it be to slap together a baby RWKV training demo, even using char tokens or words from tiny-shakespeare similar to nanogpt, written in the same technology that you're using now for inference? Even headless would be fine. I believe this would be a big help to many people.