ymy-k / Hi-SAM

[arXiv preprint] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Apache License 2.0
193 stars 10 forks source link

How is output token in Fig3 obtained? #6

Closed Sen-Ran closed 6 months ago

Sen-Ran commented 6 months ago

Thanks for your splendid work, but I didn't see any description of how to get the output token before S-Decoder and output tokens before H-Decoder. The paper says ''Let ts out ∈ R1×256 denote the inherited output token, which is the first slice of SAM’s output tokens.'' but it's still a little ambiguous.

ymy-k commented 6 months ago

They are initialized with the output tokens of SAM's mask decoder.