First, give a thumbs up to your work. But I have a question. The paper mentions decomposing cross attention into space and channels. What is the difference between these two and why is it called space and channel. The code only shows that the objects used for self attention calculation are different between the two. This seems to have nothing to do with space and channels.
First, give a thumbs up to your work. But I have a question. The paper mentions decomposing cross attention into space and channels. What is the difference between these two and why is it called space and channel. The code only shows that the objects used for self attention calculation are different between the two. This seems to have nothing to do with space and channels.