Closed lizhenstat closed 1 year ago
Hi~ The difference lies in that our model modeling: $P(C|do(U,I)) = \sum{z} P(C|U,I,z)P(z)$, while the BPR model modeling $P(C|do(U,I)) = \sum{z} P(C|U,I,z)P(z|I)$. The differences in the implementations lie in the model for predicting C and the training process of the model. PD does use the information of popularity bias during training. Please read Algorithm-1 with noting which model is used during training.
Hi, thanks for your great paper and the sharing the code!
I have a question related to the model. In Sec 3.2, Step-2. Estimating $\sumz P(C|U,I,z)p(z)$. The last sentence of this paragraph is we can use $ELU'(f{\theta} (u,i))$ to estimate P(C|do(U,I)). And $f_{\theta}(u,i)$ denotes any user-item matching model and the paper chooses the MF model(last paragraph in Page-14). As I understand, the PD model embeds an ELU' activation function and that's the main difference between the BPRMF model and the PD model. (I guess I am missing sth here.)
According to Algorithm-1, during the training and inference, the PD model does not use the information of popularity bias. (I understand the PDA model injected the popularity bias. )
But according to Table-1, the PD model performs much better than BPRMF model. and I am confused about this.
Any help would be appreciated and thanks for your time.