The position embedding in the ViT-B weight file is for 1024x1024 resolution images. How did the author input the 512x512 resolution image? After I created LoRA_Sam, I tried to input a 512x512 resolution image, but it was prompted that the dimensions of the position embedding did not match.
The position embedding in the ViT-B weight file is for 1024x1024 resolution images. How did the author input the 512x512 resolution image? After I created LoRA_Sam, I tried to input a 512x512 resolution image, but it was prompted that the dimensions of the position embedding did not match.