Closed JackHeroFly closed 2 years ago
Hi,
Yes, we did not consider the attention in between patches. MedT is not a sequence to sequence architecture like ViT. We show that even without that we get a good performance without using any pretrained weights.
Hello author. Thanks for your code. Do you just calculate the attention on the patch image whose size is 32x32 and attention between patches are not considerded? Look at the following codes. "for i in range(0,4): for j in range(0,4): x_p = xin[:,:,32i:32(i+1),32j:32(j+1)] #4x1x32x32
begin patch wise