zyang1580 / PDA

This is an implementation for our SIGIR 2021 paper "Causal Intervention for Leveraging Popularity Bias inRecommendation" based on tensorflow..
96 stars 30 forks source link

Difference between the PD model and the BPRMF model #11

Closed lizhenstat closed 1 year ago

lizhenstat commented 1 year ago

Hi, thanks for your great paper and the sharing the code!

I have a question related to the model. In Sec 3.2, Step-2. Estimating $\sumz P(C|U,I,z)p(z)$. The last sentence of this paragraph is we can use $ELU'(f{\theta} (u,i))$ to estimate P(C|do(U,I)). And $f_{\theta}(u,i)$ denotes any user-item matching model and the paper chooses the MF model(last paragraph in Page-14). As I understand, the PD model embeds an ELU' activation function and that's the main difference between the BPRMF model and the PD model. (I guess I am missing sth here.)

According to Algorithm-1, during the training and inference, the PD model does not use the information of popularity bias. (I understand the PDA model injected the popularity bias. )

But according to Table-1, the PD model performs much better than BPRMF model. and I am confused about this.

Any help would be appreciated and thanks for your time.

zyang1580 commented 1 year ago

Hi~ The difference lies in that our model modeling: $P(C|do(U,I)) = \sum{z} P(C|U,I,z)P(z)$, while the BPR model modeling $P(C|do(U,I)) = \sum{z} P(C|U,I,z)P(z|I)$. The differences in the implementations lie in the model for predicting C and the training process of the model. PD does use the information of popularity bias during training. Please read Algorithm-1 with noting which model is used during training.