Open AmariJane opened 11 months ago
I am very sorry that I made a mistake in organizing the original code into code modules. I'm sorry to confuse you. Before calculating the global attention mechanism, it is necessary to divide the Windows. Here, the strategy of dividing the Windows is to set the input as (G×G, (H×W)/G×G,C), So the correct code would be global_x = Rearrange('b d (w1 x) (w2 y) -> b x y w1 w2 d', w1=w, w2=w)(x)
. Fortunately, the model parameters are based on previous code runs, so they are still available.
Thank you for your interest and timely feedback on our project. If you have any other questions or find any other questions, please feel free to keep asking.
But your fix is the same as the source code. Can you explain how dividing the window into G*G is reflected in the code? Thank you!
Hi, this is the efficient partitioning of tensor data through einops. For local attention, the tensor is actually divided into KxK Windows, and the partitioning operation is similar to the Swin Transformer. For global attention, we actually divide the tensor into a grid of GxG, similar to dilated convolutions. You can use the following example to understand how to use einops to partition the GxG grid.
import torch
tensor = torch.zeros(8, 8) tensor[:4, :4] = 1 tensor[:4, 4:] = 2 tensor[4:, :4] = 3 tensor[4:, 4:] = 4
the result tensor is
tensor([[1., 1., 1., 1., 2., 2., 2., 2.], [1., 1., 1., 1., 2., 2., 2., 2.], [1., 1., 1., 1., 2., 2., 2., 2.], [1., 1., 1., 1., 2., 2., 2., 2.], [3., 3., 3., 3., 4., 4., 4., 4.], [3., 3., 3., 3., 4., 4., 4., 4.], [3., 3., 3., 3., 4., 4., 4., 4.], [3., 3., 3., 3., 4., 4., 4., 4.]])
2. Then, einops is used to divide the global grid attention mechanism window
from einops import rearrange w=4 rearrange(tensor, '(w1 x) (w2 y) -> x y w1 w2', w1=w, w2=w)
the result tensor is
tensor([[[[1., 1., 2., 2.], [1., 1., 2., 2.], [3., 3., 4., 4.], [3., 3., 4., 4.]],
[[1., 1., 2., 2.],
[1., 1., 2., 2.],
[3., 3., 4., 4.],
[3., 3., 4., 4.]]],
[[[1., 1., 2., 2.],
[1., 1., 2., 2.],
[3., 3., 4., 4.],
[3., 3., 4., 4.]],
[[1., 1., 2., 2.],
[1., 1., 2., 2.],
[3., 3., 4., 4.],
[3., 3., 4., 4.]]]])
3. Then, einops is used to divide the local window attention mechanism window
w=4 rearrange(tensor, '(x w1) (y w2) -> x y w1 w2', w1=w, w2=w)
the result tensor is
tensor([[[[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]],
[[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.]]],
[[[3., 3., 3., 3.],
[3., 3., 3., 3.],
[3., 3., 3., 3.],
[3., 3., 3., 3.]],
[[4., 4., 4., 4.],
[4., 4., 4., 4.],
[4., 4., 4., 4.],
[4., 4., 4., 4.]]]])
Hoping can help you.
Understanding! Thank you very much.
Hi, first of all, thank you for your work. I have a question:
global_x = Rearrange('b d (x w1) (y w2) -> b x y w1 w2 d', w1=w, w2=w)(x) global_x = self.grid_attn(global_x) global_x = Rearrange('b x y w1 w2 d -> b d (w1 x) (w2 y)')(global_x) res.append(global_x)
In the above code, I'm having a hard time finding the difference between local attention and global attention. I will be grateful if you can answer my query