Closed WithMeteor closed 1 year ago
I may know the cause of the problem. When calculating the weight of the Adjacency Matrix, the dimension of the slice needs to be adjusted when obtaining the _queryend and _keystart. Change _queryprime[end] to _queryprime[:, end] and _keyprime[start] to _keyprime[:, start] in line 143 and 188. Change _attnnormalizer[end] to _attnnormalizer[:, end] in line 147 and 192 will solve the problem. This issue will be closed.
When reading the source code of NodeFormer, I found that when calculating QKV attention, the first and second dimensions of query/key/value were exchanged, such as lines 169-171 of nodeformer.py. After calculating attention, the first two dimensions were exchanged again when performing normalization. At first, I thought this work was unnecessary until I commented out the code and discovered a program memory overflow. Therefore, I am very curious about the principle of this step. Does placing the _nodenumber in the second dimension affect the complexity of matrix multiplication when calculating the dot product of key and value? Therefore, the _nodenumber was placed in the first dimension in advance.