halleewong / ScribblePrompt

[ECCV 2024] ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Medical Image
http://scribbleprompt.csail.mit.edu/
Apache License 2.0
122 stars 11 forks source link

dimension limit #4

Open sedghi opened 4 months ago

sedghi commented 4 months ago

Hi, great work indeed.

I noticed the following code:

For best results, image should have spatial dimensions of 128x126 and pixel values min-max normalized to the range.

I was wondering how much it would impact the model's performance if we used regular dimensions for CT and MR images, such as 512x512. Have you investigated this aspect?

halleewong commented 2 months ago

Thanks for reaching out!

The model was trained on images resized to 128x128 so it performs best at that resolution. I haven't tried 512x512, but in the paper we did evaluate one of the manual scribble datasets (ACDC) and did the user study with images at 256x256 resolution, and the models still performed well.

I have noticed the CNN version of our model (ScribblePrompt-UNet) is more robust to changes in resolution than the SAM architecture version (ScribblePrompt-SAM). With 256x256 images, ScribblePrompt-UNet works best if you downsize the image to 128x128 for inference and then upsample the prediction to 256x256, as opposed to running inference on the 256x256 image directly.

For ScribblePrompt-SAM, it's better to input the 256x256 image without downsizing because the inference code will upsampled the input image to 1024x1024 for the SAM encoder. The SAM decoder outputs predictions at 256x256 resolution and then resizes them to the input image size.

If you have some original resolution CT/MRs handy, you could try them in our huggingface demo. The demo app automatically resizes the image to 128x128 for inference. There's also code for the app here if you would prefer to run the demo locally.