nagataka / Read-a-Paper

Survey
6 stars 1 forks source link

Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions #37

Open KarlXing opened 3 years ago

KarlXing commented 3 years ago

Summary

Link

Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions

Author/Institution

Zhengxian Lin, Kin-Ho Lam, Alan Fern Oregon State

What is this

One-sentence The authors propose an architecture than can explain why an agent prefer one action over another one by utilizing integrated gradients to explain the association between value function and its input-GVF features.

Full This work proposes an architecture that can explain why an agent prefers one action over another. First, they use GVF to learn accumulated manual designed features which are easier for human to understand. Second, the value function is not based on raw state representation but based on GVF features. Then when we want to understand why the Q(a) is higher than Q(b), we can associate Q(a) - Q(b) with the GVF features. Third, they use integrated gradients to find the importance of GVF features to the difference between Q(a) and Q(b). Briefly speaking, integrated gradients is a local explanation method that approximates the non-linear function with a linear function. The weights in the approximated linear function could be used directly to indicate the influence or importance of input features. Finally, they also used minimal sufficient explanation (MSX) to handle the problem of large number of GVF features.

Comparison with previous researches. What are the novelties/good points?

No benchmark I think

Key points

GVF, Integrated gradients, RL explanability

How the author proved effectiveness of the proposal?

Demonstrate how their approach help people to understand the agent behaviors on several simple but interesting tasks. Good experiments and demonstration. It's understandable that the tasks are relative simple.

Any discussions?

It's a very interesting method but can be hard to be successful in more complicated tasks. One thing we can learn from that they provide some proof in table-based case. It's not hard, but you know, math is a trick for top conferences.

What should I read next?

I will probably not do extended reading since I don't see the potential of future work or its application on more complex tasks. But I'm happy to see they use integrated gradients. I might also read MSX if I have a chance to use it.

nagataka commented 3 years ago