harrisonvanderbyl / rwkv-cpp-accelerated

A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependencies
MIT License
306 stars 19 forks source link

Training, in -cpp-cuda, on one machine? #30

Open SCRIER-org opened 1 year ago

SCRIER-org commented 1 year ago

This project seriously rocks. Thank you very much.
I am not understanding the training mathematics for RWKV. And I want to run training, from scratch and update, off of a legacy C++ system. How easy would it be to slap together a baby RWKV training demo, even using char tokens or words from tiny-shakespeare similar to nanogpt, written in the same technology that you're using now for inference? Even headless would be fine. I believe this would be a big help to many people.