Thanks for your splendid work, but I didn't see any description of how to get the output token before S-Decoder and output tokens before H-Decoder. The paper says ''Let ts out ∈ R1×256 denote the inherited output token, which is the first slice of SAM’s output tokens.'' but it's still a little ambiguous.
Thanks for your splendid work, but I didn't see any description of how to get the output token before S-Decoder and output tokens before H-Decoder. The paper says ''Let ts out ∈ R1×256 denote the inherited output token, which is the first slice of SAM’s output tokens.'' but it's still a little ambiguous.