Technique mistake in the paper

Hi,

About computing p(x|T), the technique currently listed in the paper is wrong:

It's clear that you are trying to normalize all scores in a batch using softmax where score function is the sum over masked activations, i.e.,

however the trace operation is not correct, since it would end in

Please consider the modification - it might be a minor mistake however resulting in unnecessary confusion.

zqs1022 / interpretableCNN