zhongkaifu / Seq2SeqSharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
Other
193 stars 38 forks source link

Add training and inference support for RWKV LSTMs #88

Open TodayAI opened 1 week ago

TodayAI commented 1 week ago

Is your feature request related to a problem? Please describe. Your Seq2SeqSharp project already support LSTMs. Please consider to implement the RWKV large language "linear attention" idea into your c# solution. The linear attention model of RWKV has a great performance on inference. See: https://www.rwkv.com/

Describe the solution you'd like Maybe it just needs a few functions to implement into Seq2SeqSharp LSTM functionality like "token shift" or "time decay". Maybe you have another idea how to improve the LSTM performance in Seq2SeqSharp. I would like to implement your solution into the Godot Game Engine for training, fine tuning and inference in pure c# code.

Describe alternatives you've considered Use "https://github.com/imxcstar/CSharp-RWKV" for inference only.

zhongkaifu commented 1 week ago

Hi @TodayAI Thanks for your suggestions and sharing.

RWKV is more likely a "linear attention" version of Transformer rather than LSTM although it also uses gates-controls. I tried the early version of RWKV, and its performance is not good in practice, especially it's super sensitive to prompts. Maybe the new version already had some improvement, but I didn't get chance to try it out.

Based my knowledge, to implement RWKV, it would be easier to start from the existing Transformer code rather than existing LSTM code. For now, I don't have plan to do so, but it's always welcome to anyone who could contribute on it.

Thanks Zhongkai Fu