Open TodayAI opened 1 week ago
Hi @TodayAI Thanks for your suggestions and sharing.
RWKV is more likely a "linear attention" version of Transformer rather than LSTM although it also uses gates-controls. I tried the early version of RWKV, and its performance is not good in practice, especially it's super sensitive to prompts. Maybe the new version already had some improvement, but I didn't get chance to try it out.
Based my knowledge, to implement RWKV, it would be easier to start from the existing Transformer code rather than existing LSTM code. For now, I don't have plan to do so, but it's always welcome to anyone who could contribute on it.
Thanks Zhongkai Fu
Is your feature request related to a problem? Please describe. Your Seq2SeqSharp project already support LSTMs. Please consider to implement the RWKV large language "linear attention" idea into your c# solution. The linear attention model of RWKV has a great performance on inference. See: https://www.rwkv.com/
Describe the solution you'd like Maybe it just needs a few functions to implement into Seq2SeqSharp LSTM functionality like "token shift" or "time decay". Maybe you have another idea how to improve the LSTM performance in Seq2SeqSharp. I would like to implement your solution into the Godot Game Engine for training, fine tuning and inference in pure c# code.
Describe alternatives you've considered Use "https://github.com/imxcstar/CSharp-RWKV" for inference only.