How do you train with a semantic segmentation dataset in ablation?

hustvl / EVF-SAM

Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"

Apache License 2.0

232 stars 8 forks source link

Closed wysnzzzz closed 1 month ago

wysnzzzz commented 1 month ago

Enter the category name as text?

wysnzzzz commented 1 month ago

And how long does it take to inference about an example?

CoderZhangYx commented 1 month ago

Yes, we simply follow LISA to input the semantic category name as the text prompt. We've tried some template construction strategies like "all {object}" mentioned in some other papers, but still find the performance decrease on RES tasks.
It should take about 1.4s (including pre-process and post-process) on a T4 GPU per inference for our released scale of model (1.32b).

wysnzzzz commented 1 month ago

Thank you very much for your answer.