Difference between the PD model and the BPRMF model

Hi, thanks for your great paper and the sharing the code!

I have a question related to the model. In Sec 3.2, Step-2. Estimating $\sumz P(C|U,I,z)p(z)$. The last sentence of this paragraph is we can use $ELU'(f{\theta} (u,i))$ to estimate P(C|do(U,I)). And $f_{\theta}(u,i)$ denotes any user-item matching model and the paper chooses the MF model(last paragraph in Page-14). As I understand, the PD model embeds an ELU' activation function and that's the main difference between the BPRMF model and the PD model. (I guess I am missing sth here.)

According to Algorithm-1, during the training and inference, the PD model does not use the information of popularity bias. (I understand the PDA model injected the popularity bias. )

But according to Table-1, the PD model performs much better than BPRMF model. and I am confused about this.

Any help would be appreciated and thanks for your time.

zyang1580 / PDA

Difference between the PD model and the BPRMF model #11