vedhas commented 6 years ago

Question 1

To use this toolbox for a regression task, I changed #predict and perform LRP for the 10 first samples part of lrp_demo.py to the following:

for cId,ixs in enumerate(xs):  
    # forward pass and prediction
    ypred = nn.forward(ixs)
    print 'True Value:     ', ys[cId]
    print 'Predicted Value:', ypred, '\n'
    #
    R = nn.lrp(ypred)  # as Eq(56) from DOI: 10.1371/journal.pone.0130140
    # R = nn.lrp(ypred, 'alphabeta', 2.)  # as Eq(60) from DOI: 10.1371/journal.pone.0130140
    # R = nn.lrp(ypred, 'epsilon', 1.)  # as Eq(58) from DOI: 10.1371/journal.pone.0130140
    plt.figure();  # heatmap(attributions[:1],annot=1,annot_kws=np.expand_dims(Vocab,0))
    plt.plot(np.sort(R));
    plt.xticks(np.arange(Vocab.shape[0]),
               Vocab[np.argsort(R)]
               , rotation='vertical')
    plt.title('Ep'+str(cId))
    plt.show()

However, I get the following error. Kindly advise!

Traceback (most recent call last):
  File "/home/panditve/workspace/CurWorkDir/LrpExplainClose2BL.py", line 546, in <module>
    R = nn.lrp(ypred, 'alphabeta', 2.)  # as Eq(58) from DOI: 10.1371/journal.pone.0130140
  File "/home/panditve/workspace/CurWorkDir/modules/sequential.py", line 316, in lrp
    R = m.lrp(R,lrp_var,param)
  File "/home/panditve/workspace/CurWorkDir/modules/module.py", line 120, in lrp
    return self._alphabeta_lrp(R,param)
  File "/home/panditve/workspace/CurWorkDir/modules/linear.py", line 152, in _alphabeta_lrp
    Z = self.W[na,:,:]*self.X[:,:,na] # localized preactivations
IndexError: too many indices for array

OR

R = nn.lrp(ypred)  # as Eq(58) from DOI: 10.1371/journal.pone.0130140
  File "/home/panditve/workspace/CurWorkDir/modules/sequential.py", line 316, in lrp
    R = m.lrp(R,lrp_var,param)
  File "/home/panditve/workspace/CurWorkDir/modules/module.py", line 112, in lrp
    return self._simple_lrp(R)
  File "/home/panditve/workspace/CurWorkDir/modules/linear.py", line 114, in _simple_lrp
    Z = self.W[na,:,:]*self.X[:,:,na] #localized preactivations

xs and ys are both 2 dimensional [number of samples, number of features (or num predictions for ys)]. For the sake of simplicity, I am predicting only 1 dimension at the moment i.e. `ys.shape [N,1]' and I got the error above.

Question2:

In the code above, I rewrote my network model going by your modules class. However, in most of my work so far, I have trained numerous keras models, and it is hard to rewrite them all + network may end up having very different weights. Do you plan to release Keras model compatible code anytime soon? Or have I missed something and I can use lrp_toolbox for my Keras models already somehow??

Question 3:

As I understand, LRP (deep taylor decomposition) is powerful enough to be able to tell me which of the features contributed most to my model's continuous valued output, per https://www.sciencedirect.com/science/article/pii/S0031320316303582 and https://www.youtube.com/watch?v=gy_Cb4Do_YE Kindly let me know though, if I am anyway wrong in my understanding. :)

christiantinauer commented 6 years ago

@Question 2:

I have implemented LRP and DTD as a Keras layer, which can be added as final layer. The implementation travels through the preceding layers and computes the relevances. Let me know if you are interested.

vedhas commented 6 years ago

As a layer? How so?? Interesting!!! Yes, interested of course... :)

christiantinauer commented 6 years ago

It is part of my research. I can make a simple repo and add the implementation there. Just a few minutes.

sebastian-lapuschkin commented 6 years ago

Hi Vedhas,

Q1:

In order to fully answer your question it would be good to know about the architecture of your model. passing your inputs and outputs as 2d arrays looks good, assuming you use a model comprised of dense layers, which require inputs (and produces outputs) of the shape [batch size x sample dims].

Q2:

We actually have a full toolbox based on keras being worked on, supporting almost all keras layers and several analysis methods besides LRP. We expect the software to release mid/end of june. The interface will be more simple: You pass your model to an analyzer, which then builds your backward analysis pipeline for [method].

Q3:

LRP and DTD do support regression out of the box. What slightly changes is the interpretation: Instead of "which components speak for/against this and that class" for classification is "How does the net arrive at this outcome", speaking generally.

vedhas commented 6 years ago

Q1:

Yes, inputs and outputs (= xs and ys) are both 2 dimensional `[number of samples, number of features (or num predictions for ys)]'

The model and the prediction computation looks like this:

nn = modules.Sequential([
         modules.Linear(ipFeatCount, 256),          modules.Tanh(), 
         modules.Linear(256, 1), modules.Tanh()
         ])
nn.train(Train,Train_L[:,curAnno:curAnno+1], 
         Xval=Devel, Yval=Devel_L[:,curAnno:curAnno+1], 
         batchsize=32)
pred=nn.forward(Devel)

Kindly advise on what I should do to resolve the Z = self.W[na,:,:]*self.X[:,:,na] #localized preactivations error.

Q2:

Looking forward to it! :)

Q3:

Yes exactly! Good to know that my understanding is correct.

sebastian-lapuschkin commented 6 years ago

does the line pred=nn.forward(Devel) in your previous post succeed or fail with said index error? From what I see, I can only assume that there might be something wrong with the inputs you give to the model, which are stored as self.X in .forward(), in a way that np.dot() works out during the forward pass, but broadcasting for the computation of Z fails.

can you please add the statement print self.X.shape, self.W.shape to the forward or lrp backward pass of the linear layer for debugging this issue and post your full command line output?

As as side node: A terminal tanh-layer might obfuscate your output info somewhat if your target is regression.

vedhas commented 6 years ago

Conceptual Qs

Why would "A terminal tanh-layer might obfuscate output info" for a regression?
LRP not good for other activation functions such as selu/elu, sigmoid, linear?

Btw, I solved the problem, thanks to your suggestion :)

I added:

print "Xshape fwd", self.X.shape, "Wshape fwd", self.W.shape right after self.X = X in forward(self,X)
print " Xshape bwd", self.X.shape, " Wshape bwd", self.W.shape; print "dWshape bwd", self.dW.shape,"dBshape bwd", self.dB.shape right after self.dB = DY.sum(axis=0) in backward(self,DY)
print "Xshape lrp", self.X.shape, "Wshape lrp", self.W.shape right before Z = self.W[na,:,:]*self.X[:,:,na] in _simple_lrp(self,R) & _epsilon_lrp(self,R,epsilon)

During Training,

Num. samples * {
Xshape fwd (32, 521) Wshape fwd (521, 256)       #<--- output of print added to forward(self,X)
Xshape fwd (32, 256) Wshape fwd (256, 1)           #<--- output of print added to forward(self,X)

 Xshape bwd (32, 256)  Wshape bwd (256, 1)       #<--- output of print added to backward(self,X)
dWshape bwd (256, 1) dBshape bwd (1,)              #<--- output of print added to backward(self,X)
 Xshape bwd (32, 521)  Wshape bwd (521, 256)   #<--- output of print added to backward(self,X)
dWshape bwd (521, 256) dBshape bwd (256,)       #<--- output of print added to backward(self,X)
}
Xshape fwd (22727, 521) Wshape fwd (521, 256)  #<--- output of print added to forward(self,X)
Xshape fwd (22727, 256) Wshape fwd (256, 1)      #<--- output of print added to forward(self,X)
Accuracy after 10000 iterations on validation set: 100.0% (l1-loss: 0.0977)
    Estimate time until current training ends : 0d 0h 0m 0s (100.00% done)
Setting network parameters to best encountered network state with 100.0% accuracy and a loss of 0.0942968439006 from iteration 1499.

Forward run for devel + score computations :

Xshape fwd (22727, 521) Wshape fwd (521, 256)   #<--- output of print added to forward(self,X)
Xshape fwd (22727, 256) Wshape fwd (256, 1)       #<--- output of print added to forward(self,X)
[0.34944037 0.4224545  0.13539175]                     # my scores

Forward run for xs=train+devel:

Xshape fwd (78819, 521) Wshape fwd (521, 256)   #<--- output of print added to forward(self,X)
Xshape fwd (78819, 256) Wshape fwd (256, 1)       #<--- output of print added to forward(self,X)

forward run for ixs =1 sample of xs: (Ref: Original question)

Xshape fwd (521,) Wshape fwd (521, 256)            #<--- output of print added to forward(self,X)
Xshape fwd (256,) Wshape fwd (256, 1)                #<--- output of print added to forward(self,X)
True Value:      [0.]
Predicted Value: [0.02465081] 

Xshape lrp (256,) Wshape lrp (256, 1)                   #<--- output of print added to _simple_lrp(self,R)
Traceback (most recent call last):
  File "/home/panditve/workspace/CurWorkDir/LrpExplainClose2BL.py", line 528, in <module>
    R = nn.lrp(ypred, 'epsilon', 1.)  # as Eq(58) from DOI: 10.1371/journal.pone.0130140
  File "/home/panditve/workspace/CurWorkDir/modules/sequential.py", line 316, in lrp
    R = m.lrp(R,lrp_var,param)
  File "/home/panditve/workspace/CurWorkDir/modules/module.py", line 118, in lrp
    return self._epsilon_lrp(R,param)
  File "/home/panditve/workspace/CurWorkDir/modules/linear.py", line 144, in _epsilon_lrp
    Z = self.W[na,:,:]*self.X[:,:,na] # localized preactivations
IndexError: too many indices for array

I noticed that X and W shapes now miss 1 dimension. So I modified ypred to ypred = nn.forward(xs[cId:cId+1]) and my program no longer returns an error. Thanks for your help.

sebastian-lapuschkin commented 6 years ago

CQ1:

tanh non-linearly pushes its input away from zero and saturates at almost constant positive and negative values if the input is high/low enough. compared to the immediate outputs of the previous linear layer I would say you lose "resolution" in your outputs which are, in the end, almost binary. This might be undesirable for regression tasks.

CQ2:

elu/selu/linear -- everything which is centerd at zero (f(0) = 0, f(x<0) < 0, f(x>0) > 0) should be fine. sigmoids, which (depending on the type) project f(0) = 0.5 can switch the sign of the relevance in the backward pass due to that. in that case, it would be better to use DTD (compute the gradient of the layer, then approximate linearly). This is also the reason we advise ignoring the softmax output. softmax does not change the output ranking, but transfers them to probability values. a negative logit output of an immediately preceding conv or linear layer which is negative would be rendered positive. Using LRP "as is" in that case would result in inverted signs for relevances in every input (or hidden unit) preceding the softmax.

Your Problem

Good to know you solved your problem!

Advise

The training code of the LRP toolbox is only rudimentary and implemented as a proof of concept for classification tasks, so people see there is no magic involved in the model training to get LRP to run.

If you want to stick to models of limited complexity, I suggest you use the neural network implementations of scikit learn, which do also support regression, different loss functions and multithreading efficiently. Then, for analysis, you can easily copy the learned weights and rebuild the model using the lrp toolbox.

Feel free to close the issue.

vedhas commented 6 years ago

Awesome! :1st_place_medal: Great help! :+1: :)

sebastian-lapuschkin commented 6 years ago

FYI:

I am glad to inform you that the public alpha of our new analysis toolbox is now online. Based on keras, the new implementation is now at least 10 times more efficient (on CPU) than our previous Caffe equivalent, up to a measured 520-fold speedup on GPU!

https://github.com/albermax/innvestigate

Best,

sebastian-lapuschkin / lrp_toolbox

LRP for 1. Regression 2. In Keras #12

Question 1

Question2:

Question 3:

Q1:

Q2:

Q3:

Q1:

Q2:

Q3:

Conceptual Qs

Btw, I solved the problem, thanks to your suggestion :)

CQ1:

CQ2:

Your Problem

Advise