wenxi-yue / SurgicalSAM

[AAAI2024] Official implementation of SurgicalSAM
MIT License
70 stars 9 forks source link

C Prompt in Prototype-based Class Prompt Encoder #9

Closed zzzyzh closed 8 months ago

zzzyzh commented 8 months ago

Hi, thank you for your excellent work!

I have a small question about (a) in Figure 3: Is Prompt: Class 4 a text, i.e. the name of the surgical instrument?

zzzyzh commented 8 months ago

Another question is why you define dataloader in each epoch? I'm looking forward to your reply!

wenxi-yue commented 8 months ago

Hi, thank you for your excellent work!

I have a small question about (a) in Figure 3: Is Prompt: Class 4 a text, i.e. the name of the surgical instrument?

Hi,

Thanks for your interest in our work.

In SurgicalSAM, prompts are in the form of class IDs without any text content. These class IDs are represented by integer numbers, each corresponding to a specific class. You may refer to the code here to see the input of our model.

wenxi-yue commented 8 months ago

nother question is why you define dataloader in each epoch?

During training, we leverage pre-computed offline SAM image embeddings. To achieve data augmentation in an offline manner, we apply diverse transformations to augment original images, compute the SAM image embeddings of the augmented images, and save them into different versions (each version is an augmented copy of the whole training set). Each epoch utilises the training data of a specific version, and so we define a new dataloader in each epoch.

You could also perform data augmentation and compute SAM image embeddings online during training, which could potentially give better results due to more diverse augmentations.

zzzyzh commented 8 months ago

Thank you for your reply!

zzzyzh commented 8 months ago

One more small request, I wrote an email requesting your preprocessing data, if that's convenient for you.

wenxi-yue commented 8 months ago

Hi,

I have uploaded our pre-processed data of EndoVis2018 to Google Drive here. Due to the storage limit, I have only put EndoVis2018 data here. Hope this helps!