med-air / Endo-FM

[MICCAI'23] Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
Apache License 2.0
146 stars 14 forks source link

Testing Endo-FM trained model on a new dataset [Classification] #12

Closed ShreyasFadnavis closed 4 months ago

ShreyasFadnavis commented 5 months ago

Hi all for this fantastic work!

I wanted to ask what would be the best way to apply the pretrained model on a new classification dataset? https://github.com/med-air/Endo-FM/blob/main/scripts/test_finetune_polypdiag.sh

I see the script here, but am not sure if this would work on 1 video/ 1 image.

Is it possible to do this in a python way and not via command line?

Thanks in advance :)

prabin333 commented 5 months ago

Hi all for this fantastic work!

I wanted to ask what would be the best way to apply the pretrained model on a new classification dataset? https://github.com/med-air/Endo-FM/blob/main/scripts/test_finetune_polypdiag.sh

I see the script here, but am not sure if this would work on 1 video/ 1 image.

Is it possible to do this in a python way and not via command line?

Thanks in advance :)

Hi, This model giving metrices as an output and it will not store any predicted images in a output dir then how you will check using video???if it is possible give the code for it..it will be helpful for us too

Kyfafyd commented 5 months ago

Hi, @ShreyasFadnavis Thanks for your interest!

I am supposing that you want to fine-tune the pretrained Endo-FM model on your own dataset? You can refer to this script for fine-tuning: https://github.com/med-air/Endo-FM/blob/main/scripts/eval_finetune_polypdiag.sh. You need to do the following steps:

Moreover, if you want to apply Endo-FM for image tasks, you can unsqueeze the image input to make it as a 1-frame video.

For only 1 video/image inference, unfortunately it is currently not supported here, maybe the easiest way to do this is to use a sample list with only one sample and run the code via https://github.com/med-air/Endo-FM/blob/main/scripts/test_finetune_polypdiag.sh

If you want to do inference on only 1 video/image, you can just change the dataset loader, for example, just load the specified video/image as the only sample in the dataset. Hope to be helpful to you~

ShreyasFadnavis commented 5 months ago

Hi @Kyfafyd - this is very helpful! Lastly, is there a way to get frame level embeddings out of EndoFM? Something like following since you build on top of DINO:

Image -> EndoFM -> Embedding

Let me know if this is not clear?

Thanks in advance😊

Kyfafyd commented 5 months ago

Hi @ShreyasFadnavis It is clear, you may obtain the frame-level embeddings from Endo-FM after this line: https://github.com/med-air/Endo-FM/blob/c0979d2d235a61f85daf171a73a34f47683a67d9/models/timesformer.py#L339 with the code:

x = rearrange(x, '(b t) n m -> b t n m', b=B, t=T)
ShreyasFadnavis commented 5 months ago

Thanks @Kyfafyd ! Closing this issue for now :)

ShreyasFadnavis commented 5 months ago

@Kyfafyd Quick question: Is it possible to provide sample .txt files that the model expects for train, val and test? I am confused about where the ground truth labels will be provided corresponding to each video.

Thanks!

Kyfafyd commented 5 months ago

Hi, you can refer to this txt file: https://mycuhk-my.sharepoint.com/:t:/g/personal/1155167044_link_cuhk_edu_hk/EXvfI1xbf2FAguh3t7pXr5IBKw98D3L9ZBMmFvKQ5A4x2w?e=abcEqp with a format as path,label

ShreyasFadnavis commented 5 months ago

Thanks @Kyfafyd !