tensorflow / tcav

Code for the TCAV ML interpretability project
Apache License 2.0
633 stars 150 forks source link

Few conceptual questions around TCAV #83

Closed drishyamlabs closed 4 years ago

drishyamlabs commented 4 years ago

Hi,

Thanks a lot for sharing the valuable codes. I have few basic questions:

As per the research paper, concept vector is orthogonal to the decision boundary. Can you please guide us where in the code is that happening? In the implementation (https://github.com/tensorflow/tcav/blob/master/tcav/tcav.py) line 86, tcav score is defined as "TCAV score (i.e., ratio of pictures that returns negative dot product wrt loss)." Can you please tell us why we are taking a negative dot product as the positive influence- It will really help solve my confusion. Looking forward to getting response soon.

BeenKim commented 4 years ago

Thanks for this question!

In paper, we describe our method as d logit / d cav, but in code, for the convenience, we do d (loss) / d cav. Since it's loss, this is opposite of logit - the lower loss is, the more likely p(x), logit is. That's why we take the negative dot product, just flip it to the right direction. Hope this helps!

Been