tianrun-chen / SAM-Adapter-PyTorch

Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts
MIT License
968 stars 83 forks source link

Questions about obtaining original SAM results and changing input image size #20

Closed songsong695 closed 1 year ago

songsong695 commented 1 year ago

Thanks for your excellent work. May I ask you two questions? (1) Did you use prompts to obtain the original SAM results in your paper? (2) What is the quickest way to change the input image size?

tianrun-chen commented 1 year ago

Greetings! We appreciate your interest in our work.

(1) Our prompt methods include the box prompt applied to the entire image and point sampling across images using SAM prompts.

(2) To resize the image, you can utilize OpenCV.

songsong695 commented 1 year ago

Greetings! We appreciate your interest in our work.

(1) Our prompt methods include the box prompt applied to the entire image and point sampling across images using SAM prompts.

(2) To resize the image, you can utilize OpenCV.

Thanks for your reply. Regarding the second issue, I tried to adjust the input image size in the config file, but the training process failed and threw an error. Could you kindly inform me whether your network currently supports training with resolutions other than 1024? Thank you again for your kind assistance.

buriedms commented 1 year ago

i have the same question, look forward to your reply, tanks!

jiachen0212 commented 1 year ago

i got the same question, case the sam's weights size: 1024 -> 64X64. so some layer's input size can not be changed. I still don't know how to change the input image size (i need it < 1024, because i only have 3090 gpu...)

tianrun-chen commented 1 year ago

@jiachen0212 @buriedms @syp66 Our approach uses an adapter-based method to sidestep the need for time-consuming fine-tuning of large models. We make use of the pre-trained weights of the original SAM model, which are designed to process inputs with a resolution of 1024. Consequently, modifying the input size can be challenging. We are still investigating memory efficient SAM model. At the current stage, it is recommended to upscale the input image using PIL or OpenCV instead of tweaking the network input size. If you encountered memory constraints for GPU, you can try using a smaller version of the SAM model (e.g. ViT-L) or switch to a different GPU.

jithf commented 10 months ago

i got the same question, case the sam's weights size: 1024 -> 64X64. so some layer's input size can not be changed. I still don't know how to change the input image size (i need it < 1024, because i only have 3090 gpu...)

@jiachen0212 Have you solved this problem? Because i also only have one 3090 GPU...

ericzw commented 6 months ago

Greetings! We appreciate your interest in our work. (1) Our prompt methods include the box prompt applied to the entire image and point sampling across images using SAM prompts. (2) To resize the image, you can utilize OpenCV.

Thanks for your reply. Regarding the second issue, I tried to adjust the input image size in the config file, but the training process failed and threw an error. Could you kindly inform me whether your network currently supports training with resolutions other than 1024? Thank you again for your kind assistance.

After changing the input size, some parameters in the pretrained ViT model will not correspond, such as the relative position encodings, etc. To address this issue, is it feasible to allow the mismatched parameters to participate in training?

ericzw commented 6 months ago

@jiachen0212 @buriedms @syp66 Our approach uses an adapter-based method to sidestep the need for time-consuming fine-tuning of large models. We make use of the pre-trained weights of the original SAM model, which are designed to process inputs with a resolution of 1024. Consequently, modifying the input size can be challenging. We are still investigating memory efficient SAM model. At the current stage, it is recommended to upscale the input image using PIL or OpenCV instead of tweaking the network input size. If you encountered memory constraints for GPU, you can try using a smaller version of the SAM model (e.g. ViT-L) or switch to a different GPU.

After changing the input size, some parameters in the pretrained ViT model will not correspond, such as the relative position encodings, etc. To address this issue, is it feasible to allow the mismatched parameters to participate in training?

tianrun-chen commented 6 months ago

The mismatch of ViT may because the incorrect configuration, please make sure you download the right pre-trained model of ViT (Vit-B, ViT-H are different)