zamling / PSALM

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
Apache License 2.0
193 stars 10 forks source link

Some questions about paper and code #14

Open josephzpng opened 3 months ago

josephzpng commented 3 months ago

Great Job! Regarding the Ref task, the paper says that the Sentence Condition is extracted from the special [REF] token, while the embedding of Sentence Condition in the code is obtained by avg_pooling the entire sentence, so should I follow the code?

zamling commented 3 months ago

Good question!

You are right! Thank you for pointing out this misalignment. We attempt lots of designs for generating condition embeddings, as shown in Table 5 in paper. There could be a misalignment when I refactor this code and choose a version with avg_pooling and corresponding ckpt. I will update the paper version asap. And before that, you can firstly follow current code (both are avg_pooling) if you are doing some finetuning on current PSALM ckpt, since current version also got a great performance when I evaluate on it :)

Thanks for your interest in our work!