joaopauloschuler / neural-api

CAI NEURAL API - Pascal based deep learning neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.
GNU Lesser General Public License v2.1
356 stars 195 forks source link

How can we create activation maps #94

Open mikerabat opened 1 year ago

mikerabat commented 1 year ago

I managed to create a small project that does a gradient ascent and shows some images but for learning purposes it would be great to see the activation like in kEffNetV1?

The activation would be calculation on a 1D ecg strip.

joaopauloschuler commented 1 year ago

Yes. I'll probably work on this along the next year.

joaopauloschuler commented 1 year ago

@mikerabat , There are some comments about this at: https://forum.lazarus.freepascal.org/index.php/topic,60843.0.html .

mikerabat commented 1 year ago

Thanks for the hint. I actually managed to create something like this a while ago though I'm not sure if I did that correctly. I also use now as the final layer a global average map which seems to work fine. In addition all the examples I found use this kind of pooling at the end of the pipeline.

Here is what I did (please note that this is just the routine that creates the activation map...):

procedure TfrmGradAscent.btnSegmentClickGlobalAveragePool(Sender: TObject);
var ecgSeg : TDoubleDynArray;
    activation : TDblDynArrArr;
    seg : integer;
    inputSize : integer;
    sampSize : integer;
    i: Integer;
    idx : integer;
    mVal : double;
    pts : Array of TPoint;
    scaleY : double;
    offsetY : double;
    m1, m2 : double;
    j: Integer;
    globAvgLayer : TNNetAvgPool;
    nextLayer : TNNetLayer;
    vInput : TNNetVolume;
    w : TNNetVolume;
    img : TBitmap;
  function ActivationToColor(act : double) : TColor;
  begin
       act := Max(0, min(1, act));
       Result := RGB( Trunc( 255*act ), Trunc(255*act), Trunc(255*act));
  end;
begin
     if not Assigned(fNN) or not Assigned(fChannels) then
     begin
          MessageDlg('Before evaluation you need to open a classifier file and open a recording', mtInformation, [mbOk], -1);
          exit;
     end;

     if not TryStrToInt( edSegNr.Text, seg ) or (seg < 1) or (seg > fNumSeg) then
     begin
          MessageDlg('Invalid number', mtError, [mbOK], -1);
          exit;
     end;

     dec(seg);

     globAvgLayer := nil;
     nextLayer := nil;
     for j := 0 to fNN.Layers.Count - 2 do
     begin
          if fNN.Layers[j] is TNNetAvgPool then
          begin
               globAvgLayer := fNN.Layers[j] as TNNetAvgPool;
               nextLayer := fNN.Layers[j + 1];
          end;
     end;

     if not Assigned(globAvgLayer) then
     begin
          MessageDlg('No global average pool found', mtError, [mbOk], -1);
          exit;
     end;

     inputSize := FNN.Layers[0].Output.Size;

     sampSize := MulDiv( inputSize, fSigChan.SampleRate, cDestSampRate );
     ecgSeg := fSigChan.ReadDblDataResamp( seg*fSigChan.SampleRate, sampSize, cDestSampRate );

     MatrixNormalizeMeanVar( @ecgseg[0], Length(ecgSeg)*sizeof(double), Length(ecgSeg), 1, True );

     SetLength(activation, nextLayer.Neurons.Count);
     for i := 0 to Length(activation) - 1 do
         SetLength( activation[i], globAvgLayer.Output.Size );

     InitImg(1 + Length(activation), Length(ecgSeg), 100);

     SetLength(pts, Length(ecgSeg) );
     m1 := MaxValue(ecgSeg);
     m2 := MinValue(ecgSeg);
     scaleY := 100/(m1 - m2);
     offsetY := (m1 + m2)/2;

     for i := 0 to Length(pts) - 1 do
     begin
          pts[i].X := i;
          pts[i].Y := Round(50 - (ecgSeg[i] - offsetY)*scaley);
     end;

     fImg[0].Canvas.Polyline(pts);

     // ###########################################
     // #### Now calculate the output
     vInput := TNNetVolume.Create(FNN.Layers[0].Output);
     FillChar(vInput.FData[0], Length(vInput.FData)*sizeof(single), 0);

     for i := 0 to Length(ecgSeg) - 1 do
         vInput.FData[i] := ecgSeg[i];

     fnn.Compute(vInput);

     for i := 0 to Length(activation) - 1 do
     begin
          w := nextLayer.Neurons[i].Weights;
          for idx := 0 to Length(activation[i]) - 1 do
              activation[i][idx] := activation[i][idx] + globAvgLayer.Output.FData[idx]*w.FData[idx];

          // relu
          for idx := 0 to Length(activation[i]) - 1 do
              activation[i][idx] := Max(0, activation[i][idx]);
     end;
     vInput.Free;

     // normalize to 0 - 1
     for i := 0 to Length(activation) - 1 do
     begin
          mVal := MaxValue(activation[i]);
          if mVal > 1e-10 then
             MatrixScaleAndAdd(@activation[i][0], Length(activation[0])*sizeof(double), Length(activation[0]), 1, 0, 1/mVal);
     end;

     // ###########################################
     // #### Create the output bitmaps (aka stretch the activation map)
     img := TBitmap.Create;
     img.SetSize(Length(activation[0]), 1);
     img.PixelFormat := pf24bit;

     for i := 0 to Length(activation) - 1 do
     begin
          for j := 0 to Length(activation[i]) - 1 do
              img.Canvas.Pixels[j, 0] := ActivationToColor( activation[i][j] );

          fImg[i + 1].Canvas.StretchDraw( Rect(0, 0, fImg[i + 1].Width - 1, fImg[i + 1].Height - 1), img );
     end;
     img.Free;

     // ###########################################
     // #### Update the view
     if Length(fImg) > 0
     then
         pbImages.Height := Length(fImg)*(fImg[0].Height + 3)
     else
         pbImages.Height := 0;

     chkStretchClick(nil);

     pbImages.Invalidate;
end;

The other idea I had was to send "Dirac" Pulses into the the network and see how that influenced/activated the output. This is like looping through all pixels (or in my case 1D ecg) and see how that influences the output.... Is that actually a valid idea?

joaopauloschuler commented 1 year ago

@mikerabat , AH! I've just realized that you are looking for Class Activation Map (CAM): https://towardsdatascience.com/class-activation-mapping-using-transfer-learning-of-resnet50-e8ca7cfd657e

Regarding "Pulses into the the network and see how that influenced/activated the output. This is like looping through all pixels (or in my case 1D ecg) and see how that influences the output.... Is that actually a valid idea?", this is a valid idea. There is actually a proper name for this method but I can't remember.

Note: when I use global avg pooling, I usually also use a higher learning rate.