idiap / fast-transformers

Pytorch library for fast transformer implementations
1.65k stars 179 forks source link

Segment-Level Recurrence with State Reuse #27

Closed burcehan closed 4 years ago

burcehan commented 4 years ago

Hi, Thanks for your great work! I have some questions, if I want to use segment-level recurrence with state reuse like Transformer-XL in language model,how to do this ,Should I rewrite the code in causal_product_cuda.cu Thanks for your help.

angeloskath commented 4 years ago

Hmm, I am not sure I follow. If you are using the causal-linear model then why do you need the segment level recurrence?

Regardless of the above, if you do want to add a segment level recurrence, I would do it as a module that contains the segment transformer and not edit the CUDA code. This should be implementable at a higher level.

Cheers, Angelos

burcehan commented 4 years ago

I use the causal-linear model, I try to calculate this matrix,that the shape of Q is (L,D), the shape of K is (S,D),the shape of V is (S,D),D is dimension ,L and S are the sequence length ,they are not equal ,I try this in the causal-linear model,but an error was returned.It does not support such calculations

angeloskath commented 4 years ago

Hi,

f165966 should be fixing this issue (namely L and S can be different now). If you have more questions or if the issue should not be closed feel free to reopen it or open another issue.

Cheers, Angelos