Why is 𝑃 (𝐶 |𝑑𝑜 (𝑈,𝐼)) but not 𝑃 (𝐶 |𝑑𝑜 (𝑈,𝐼), Z)?

zyang1580 / PDA

This is an implementation for our SIGIR 2021 paper "Causal Intervention for Leveraging Popularity Bias inRecommendation" based on tensorflow..

96 stars 29 forks source link

Why is 𝑃 (𝐶 |𝑑𝑜 (𝑈,𝐼)) but not 𝑃 (𝐶 |𝑑𝑜 (𝑈,𝐼), Z)? #18

Open Lucky168-1222 opened 6 months ago

Lucky168-1222 commented 6 months ago

Thank you for sharing the code! I'm troubled by a probability problem.

In the causal graph shown in Figure 1(b), C's parent nodes contain U, I & Z. Therefore, in the probability functions evaluated on causal graphs, why not use 3 conditional variables like 𝑃 (𝐶 |𝑑𝑜 (𝑈,𝐼), Z)?

I‘m looking forward to your early reply!

zyang1580 commented 6 months ago

We have two versions of our method: PD and PDA. In PD, we use P(C|do(U, I)), while in PDA, we use P(C|do(U, I), do(Z)).

Lucky168-1222 commented 6 months ago

We have two versions of our method: PD and PDA. In PD, we use P(C|do(U, I)), while in PDA, we use P(C|do(U, I), do(Z)).

The performing do-calculus on G leads to: Is the Z appearing in the second step of the formula in the above figure related to the Z → I → C path? But when intervening on I, this edge is no longer present. Does P (C | U, I, z) in the final step represent the U→C, I→C, and Z→C paths? What does P (z) (item popularity) without a Z→I path represent?

zyang1580 commented 6 months ago

A1： (2) is because of Bayes’ theorem, and Z is not related to the path. A2 & A3: P(Z) refers to the marginal distribution of Z, and P (C | U, I, z) is a conditional distribution and is related to the paths you mentioned.

Lucky168-1222 commented 6 months ago

A1： (2) is because of Bayes’ theorem, and Z is not related to the path. A2 & A3: P(Z) refers to the marginal distribution of Z, and P (C | U, I, z) is a conditional distribution and is related to the paths you mentioned.

Thanks for your clear answer! I think there is a confusing question. The aim of the training stage is eliminating the bad effect of popularity bias, that is cutting off Z→I by intervening the exposure mechanism. However, P(C|U,I,z) represents U→C、I→C and Z→C paths, which are all profit to recommend. The inference stage used these edges again, which has an opposite target for better usage of popularity bias. It seems that targets were all achieved during the training stage. May I ask how to explain it?

zyang1580 commented 6 months ago

As mentioned above, there are two selections: PD, which could be thought of as a domain generalization method, and PDA, which could thought of as a domain adaptation method. PD aims to achieve better generalization performance without predicting future information (that is, cutting down the edge $Z->I$ and not using $Z->C$). In contrast, PDA aims to achieve better adaptation with the assumption/prediction of the future, leveraging the edge $Z->C$.