harrisonvanderbyl / rwkv-cpp-accelerated

A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependencies
MIT License
306 stars 19 forks source link

Multi-GPU support #3

Open nenkoru opened 1 year ago

nenkoru commented 1 year ago

Current implementation, as I see, doesn't have an ability to share load(VRAM) between GPUs as BlinkDL's ChatRWKV does. It would be great for running on a two or more consumer grade GPUs without opting for enterprise ones.