zqhang / AnomalyCLIP

Official implementation for AnomalyCLIP (ICLR 2024)
MIT License
285 stars 30 forks source link

about prompt #12

Closed nhw649 closed 6 months ago

nhw649 commented 6 months ago

I can't understand the following code.

self.register_buffer("token_prefix_pos", embedding_pos[:, :, :1, :] )
self.register_buffer("token_suffix_pos", embedding_pos[:, :, 1 + n_ctx_pos:, :])
self.register_buffer("token_prefix_neg", embedding_neg[:, :, :1, :])
self.register_buffer("token_suffix_neg", embedding_neg[:, :, 1 + n_ctx_neg:, :])

I think the positive prompt should be ['X X X X X X X X X X X X object.'], so the prefix should be 'X X X X X X X X X X X X ', and the suffix should be '.'. I don't know if my understanding is wrong, can you help me to answer it?

zqhang commented 6 months ago

CLIP encodes text with a fixed length. token_prefix_pos and token_suffix_pos represent fixed embeddings before and after learnable text embedding.

The 'X X X... X' is only used to hold the positions of learnable text embedding to input it into the text encoder for the whole text embedding. After that, the corresponding positions of the whole text embedding will be replaced with learnable text embedding.

nhw649 commented 6 months ago

CLIP encodes text with a fixed length. token_prefix_pos and token_suffix_pos represent fixed embeddings before and after learnable text embedding.

The 'X X X... X' is only used to hold the positions of learnable text embeddings to input text encoder for the whole text embedding. After that, the corresponding positions of the whole text embedding will be replaced with leanable text embedding.

ok, thanks.