hustvl / EVF-SAM

Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Apache License 2.0
302 stars 13 forks source link

About training cost for SAM2 ft #22

Closed cocoshe closed 1 month ago

cocoshe commented 1 month ago

https://github.com/hustvl/EVF-SAM/issues/20 shows the SAM2 don't need to ft the decoder because of the powerful video comprehension in SAM2, so I am curious about that the training cost may be less than ft SAM1 since the decoder in SAM1 is trainable?

Any training detail about the SAM2? For example, training devices, time cost, or something spacial and different from the ft in SAM1.

CoderZhangYx commented 1 month ago

Honestly, we use almost the same training settings as evf-sam with sam1, only different at preprocess where sam1 use resizelongest+padding and sam2 use resize.

Besides, you are right that evf-sam with sam2 is easier to train compared to that of sam1.

cocoshe commented 1 month ago

Honestly, we use almost the same training settings as evf-sam with sam1, only different at preprocess where sam1 use resizelongest+padding and sam2 use resize.

Besides, you are right that evf-sam with sam2 is easier to train compared to that of sam1.

OK, thx for your reply~