One-sentence
The authors propose an architecture than can explain why an agent prefer one action over another one by utilizing integrated gradients to explain the association between value function and its input-GVF features.
Full
This work proposes an architecture that can explain why an agent prefers one action over another. First, they use GVF to learn accumulated manual designed features which are easier for human to understand. Second, the value function is not based on raw state representation but based on GVF features. Then when we want to understand why the Q(a) is higher than Q(b), we can associate Q(a) - Q(b) with the GVF features. Third, they use integrated gradients to find the importance of GVF features to the difference between Q(a) and Q(b). Briefly speaking, integrated gradients is a local explanation method that approximates the non-linear function with a linear function. The weights in the approximated linear function could be used directly to indicate the influence or importance of input features. Finally, they also used minimal sufficient explanation (MSX) to handle the problem of large number of GVF features.
Comparison with previous researches. What are the novelties/good points?
No benchmark I think
Key points
GVF, Integrated gradients, RL explanability
How the author proved effectiveness of the proposal?
Demonstrate how their approach help people to understand the agent behaviors on several simple but interesting tasks. Good experiments and demonstration. It's understandable that the tasks are relative simple.
Any discussions?
It's a very interesting method but can be hard to be successful in more complicated tasks. One thing we can learn from that they provide some proof in table-based case. It's not hard, but you know, math is a trick for top conferences.
What should I read next?
I will probably not do extended reading since I don't see the potential of future work or its application on more complex tasks. But I'm happy to see they use integrated gradients. I might also read MSX if I have a chance to use it.
I didn't know some existing key methods referred in the paper such as IG and GVF (so it's good to have a chance to skim this paper)
The state-action feature function F(s,a) doesn't sound scalable. In fact this paper assumes hand-engineered features
Although I agree with the claim "for many applications that can benefit from informative explanations, the utility will outweigh the cost" in a practical situation, due to this GVF dependence, it's hard to extend the work
Summary
Link
Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions
Author/Institution
Zhengxian Lin, Kin-Ho Lam, Alan Fern Oregon State
What is this
One-sentence The authors propose an architecture than can explain why an agent prefer one action over another one by utilizing integrated gradients to explain the association between value function and its input-GVF features.
Full This work proposes an architecture that can explain why an agent prefers one action over another. First, they use GVF to learn accumulated manual designed features which are easier for human to understand. Second, the value function is not based on raw state representation but based on GVF features. Then when we want to understand why the Q(a) is higher than Q(b), we can associate Q(a) - Q(b) with the GVF features. Third, they use integrated gradients to find the importance of GVF features to the difference between Q(a) and Q(b). Briefly speaking, integrated gradients is a local explanation method that approximates the non-linear function with a linear function. The weights in the approximated linear function could be used directly to indicate the influence or importance of input features. Finally, they also used minimal sufficient explanation (MSX) to handle the problem of large number of GVF features.
Comparison with previous researches. What are the novelties/good points?
No benchmark I think
Key points
GVF, Integrated gradients, RL explanability
How the author proved effectiveness of the proposal?
Demonstrate how their approach help people to understand the agent behaviors on several simple but interesting tasks. Good experiments and demonstration. It's understandable that the tasks are relative simple.
Any discussions?
It's a very interesting method but can be hard to be successful in more complicated tasks. One thing we can learn from that they provide some proof in table-based case. It's not hard, but you know, math is a trick for top conferences.
What should I read next?
I will probably not do extended reading since I don't see the potential of future work or its application on more complex tasks. But I'm happy to see they use integrated gradients. I might also read MSX if I have a chance to use it.