Open Whisht opened 1 year ago
I also could not find the KL-divergence loss in the code. Do you have any idea?
Add the Negative Entropy of sim_i2t_m
to the total loss by multiplying the coefficient $\alpha$. You will get the KL divergence of $q$ and $p$. Some equations for reference:
$$ KL(q||p) = E_q \log \frac{q}{p} = -E_q \log p + (- E_q \log q) =CE(q|p) - H(p) $$
In the loss computation process in
[ALBEF](https://github.com/salesforce/ALBEF/blob/b9727e43c3040491774d1b22cc27718aa7772fac/models/model_pretrain.py#L103C3-L103C3)
, the computation is a little different to the raw paper. Let's takeloss_i2t
for example.This loss should be
In the code above-cited,
image_feat_m
is $I_m \in R^{n\times d}$,text_feat_all
is $T_a \in R^{d\times(n+n_q)}$,sim_targets
noted $y \in R^{n\times(n+n_q)}$, $p^{\text{i2t}}(I)=\mathop{\text{softmax}}(S(I,T_a))$, $q^\text{i2T}(I)=\mathop{\text{softmax}}(S(I_m,T_a))$. Here, $_m$ means momentum.Suppose that $n$
batch_size = 2
,queue_size = 2
, so $n_q = 2 \times 2 = 4$.The first term is not a KL divergence between $q$ and $p$, i.e., a self-entropy term lost. So, does this affect the performance of
ALBEF
? I think it should be a good regularization term.