vanvalenlab / cellSAM

Codebase for "A Foundation Model for Cell Segmentation"
Other
41 stars 7 forks source link

Bounding box prediction using other backbones #20

Open carlosuc3m opened 4 months ago

carlosuc3m commented 4 months ago

Hello, first of all I would like to congratulate you an an awesome work.

I am developing a series of Java based plugins for different Java softwares (Fiji, ImageJ, Icy) that use lighter variants of SAM to improve the manual annotation.

We also want to include automatic segmentation so we thought that providing CellSAM would be the best option, because the implementation of SAM Everything does not really work on cells.

We want to use lighter variants of SAM because we want the plugins to run on any computer and the faster the better.

These are the models that we are using:

From what I have seen you trained the AnchorDetr model with the SAM-b ViT encoder. SAM b has a different number of features than the models that we use. Do you know of any way to adapt your AnchorDetr to these models. I am thinking about interpolation the output feature maps.

If not, for how long did you train the model? And on how many GPUs?

Regards, Carlos

carlosuc3m commented 4 months ago

Also after reading the paper and training used I wonder why did you need to train the ViT encoder for the CellFinder model. Weren't the SAM ViT feature maps good enough to feed them to the decoder? Have you tried with SAM-h?

Maybe the need to retrain it is because SAM-b feature maps are not good enough. This is where EfficientSAM or EfficientViTSAM are interesting, because their performance is quite good compared even to SAM-h, so maybe their ViT encoder can be frozen during the CellFinder step.

Sorry if any of these questions are stupid.