hustvl / EVF-SAM

Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Apache License 2.0
232 stars 8 forks source link

How do you train with a semantic segmentation dataset in ablation? #3

Closed wysnzzzz closed 1 month ago

wysnzzzz commented 1 month ago

Enter the category name as text?

wysnzzzz commented 1 month ago

And how long does it take to inference about an example?

CoderZhangYx commented 1 month ago
  1. Yes, we simply follow LISA to input the semantic category name as the text prompt. We've tried some template construction strategies like "all {object}" mentioned in some other papers, but still find the performance decrease on RES tasks.
  2. It should take about 1.4s (including pre-process and post-process) on a T4 GPU per inference for our released scale of model (1.32b).
wysnzzzz commented 1 month ago

Thank you very much for your answer.