Closed tonydavis629 closed 2 months ago
Hi.
In code, C
in isolation denotes some channel size -- the exact meaning is context-dependent. In the paper, C
is a shared channel size for most of the operations, except the key tensor (which is C^k
). See https://github.com/hkchengrex/Cutie/blob/2ac7ac21d048e7ff8b2b033a084e0e4ea7b1216c/cutie/config/model/base.yaml#L4-L8
where C^k
is 64, and all the other 256
jointly refer to C
. We experimented with different values before (and thus allowed the config to set them differently) but just found that it's easier to tie them to a single value.
For P
, it is a value inherited from XMem. It denotes the number of prototypes (Section 3.3 of XMem). Semantically [HW/P] denotes the total number of query elements. During memory reading, it would be the number of pixels HW, and during memory potentiation, it would be the number of prototypes.
Ah [HW/P] is HW or P, not HW divided by P. I see, thank you.
Rex, in your paper you refer to the C (or C^k) dimension, but I can't find a reference as to what this C is. Is it the embedding dimension?
Also, the code refers to a value P, as in
B x CK x [HW/P] - Query keys
. I'm assuming HW is image height and width, but what is P?I'm working on strategies to reduce Cutie's memory requirements for high resolution images, but the dimensionality of the similarity/affinity matrix is really severe, so I'm looking for any opportunities to reduce this.