Bounding box prediction using other backbones

Hello, first of all I would like to congratulate you an an awesome work.

I am developing a series of Java based plugins for different Java softwares (Fiji, ImageJ, Icy) that use lighter variants of SAM to improve the manual annotation.

We also want to include automatic segmentation so we thought that providing CellSAM would be the best option, because the implementation of SAM Everything does not really work on cells.

We want to use lighter variants of SAM because we want the plugins to run on any computer and the faster the better.

These are the models that we are using:

https://github.com/yformer/EfficientSAM/tree/main
https://github.com/mit-han-lab/efficientvit/tree/master

From what I have seen you trained the AnchorDetr model with the SAM-b ViT encoder. SAM b has a different number of features than the models that we use. Do you know of any way to adapt your AnchorDetr to these models. I am thinking about interpolation the output feature maps.

If not, for how long did you train the model? And on how many GPUs?

Regards, Carlos

vanvalenlab / cellSAM

Bounding box prediction using other backbones #20