Closed RayXu14 closed 5 years ago
in class FullAttention in layers.py,
self.linear_final = Parameter(torch.ones(1, hidden_size), requires_grad = True)
I guess this should be used as D, the diagonal matrix. But in code I do not find any evidence that you operate on diagonal. Instead you just expand this parameter to be a matrix. Could you help me to figure out how do you use diagonal matrix?
I have known why, please forgive me to raise a fool issue T T
in class FullAttention in layers.py,
I guess this should be used as D, the diagonal matrix. But in code I do not find any evidence that you operate on diagonal. Instead you just expand this parameter to be a matrix. Could you help me to figure out how do you use diagonal matrix?