Closed 945716994 closed 1 year ago
Hi, thanks for your questions!
Hi, thanks for your questions!
- This is only needed when the regime, Rt, also influences the image. In the experiments of the paper, this was only the case for CausalWorld where Rt is the arm positions/motor angles of the tri-finger, and the robot itself was visible in the image. For other datasets, no action was used in the decoder.
- Yes, the action variable can be one example for a regime. There are more possible settings, e.g. environment properties or agent states.
- We ended up not using it, so you can ignore it. It was a potential way of separating the robot position in the CausalWorld further from the latents zt, although it ended up not being needed nor providing improvements.
Thanks for your replay! I still have some confusion about question 1. You said "This is only needed when the regime, Rt, also influences the image." . Assumption I have a image $x{t-1}$, and i do a action (e.g. move the robot arm in CausalWorld or Pick something object in Ai2thor) $R{t-1}$,then i get the new image $x_t$. In this situation, the regime $R_t$ will influence next time image, so when we train the autoencoder, the $zt=encoder(x{t-1})$ but the reconstrcution image $x{t-1}^{'}$ in decoder needed consider $R{t-1}$ ?
The core of my confusion is that $R_t$ affects the generation of images at time t, so $z_t$ obtained by encoder contains $R_t$ information, then why does $R_t$ need to be considered in decoder
Sorry for the confusion, I distinguish here between implicit/indirect and explicit/direct influence to the image. Most environments only have an 'implicit'/'indirect' effect, meaning that the action impacts how the causal factors change, but only the causal factors are visualized. In other words, the image only shows the causal variables at a time point. In CausalWorld, the action/regime has an 'explicit', or 'direct', effect on the image. The causal variables alone cannot explain the image anymore, but the robotic state itself can be seen in the image. Hence, a plain autoencoder would place $R^t$ in the latent space as well. However, we do not need $R^t$ in the latent space since (a) it is observed, and (b) it is not a causal variable of interest. Thus, we give it explicitly to the decoder to remove the need for placing $R^t$ in the latent space.
Sorry for the confusion, I distinguish here between implicit/indirect and explicit/direct influence to the image. Most environments only have an 'implicit'/'indirect' effect, meaning that the action impacts how the causal factors change, but only the causal factors are visualized. In other words, the image only shows the causal variables at a time point. In CausalWorld, the action/regime has an 'explicit', or 'direct', effect on the image. The causal variables alone cannot explain the image anymore, but the robotic state itself can be seen in the image. Hence, a plain autoencoder would place Rt in the latent space as well. However, we do not need Rt in the latent space since (a) it is observed, and (b) it is not a causal variable of interest. Thus, we give it explicitly to the decoder to remove the need for placing Rt in the latent space.
Thank you very much for answering my confusion! I have one last question. Assuming I have completed the training of the model, how can I select the departments related to Causal variables $C$ from latent variable $z$ and display them through decoder? (For example, one part of z learns the causal variable information that represents the background, and the other part z may represent an object.) Can you provide a demo? Thank you very much
Depending on the data you have available, you can either
Looking forward to your reply!!