Closed cocoshe closed 1 month ago
Honestly, we use almost the same training settings as evf-sam with sam1, only different at preprocess where sam1 use resizelongest+padding and sam2 use resize.
Besides, you are right that evf-sam with sam2 is easier to train compared to that of sam1.
Honestly, we use almost the same training settings as evf-sam with sam1, only different at preprocess where sam1 use resizelongest+padding and sam2 use resize.
Besides, you are right that evf-sam with sam2 is easier to train compared to that of sam1.
OK, thx for your reply~
https://github.com/hustvl/EVF-SAM/issues/20 shows the SAM2 don't need to ft the decoder because of the powerful video comprehension in SAM2, so I am curious about that the training cost may be less than ft SAM1 since the decoder in SAM1 is trainable?
Any training detail about the SAM2? For example, training devices, time cost, or something spacial and different from the ft in SAM1.