SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

Abstract

Propose Singular Vector Canonical Correlation Analysis (SVCCA), a tool to quickly compare two representations in deep neural network.
Can measure intrinsic dimensionality of layers, probe learning dynamics in training and show where the class-specific information is formed in the network

Details

Analyzing value of single neuron over all train/valid dataset
Four main contributions
- Dimensionality of layer's learned representation is much smaller than the number of neurons in the layer, insight into model compression
- Model learning generally converge bottom up.
- Develop a method based on the discrete Fourier transform which greatly speeds up the application of SVCCA to convolutional neural networks
- SVCCA captures the semantics of different classes, with similar classes having similar sensitivities
SVCCA
- two steps (SVD + CCA) are needed because SVD selects important directions and CCA extracts pair-wise correlation that is invariant to affine transformation
Result
- (a) 25 top SVCCA dimensions provide nearly same accuracy as 512 neurons
- (b) 100 neurons for top SVCCA cannot perform as 25 top SVCCA dimensions, meaning 25 top SVCCA information is distributed over neurons
- 30 top SVCCA directions can replicate result with full (200) outputs
SVCCA for Conv Layers
- Same Layer Comparison : same model, same layer with different initialization, different timestep in training
- Different Layer Comparison : observe how different, and check where the information re-use or revisit is occuring
Applications
- SVCCA similarity in singular value, analogous to multidimensional Pearson correlation
- Freeze Training : freezing lower layers during training dynamically reduces training cost and improves generalization, motivated by below image where lower layers are similar to fully trained model
- When are Classes learned?
- compare logit and all layers's SVCCA to observe when the class-specific information is obtained
- easier tasks are learned in early stages
- Model Compression
- Replacing usual W x X into (W x P_x_transpose) * (P_x * X) where P_x is SVCCA projection matrix, reduces the number of flops, but retaining 99% of the information
Appendix
- Learning Dynamics of Same Model with Different Initialization : initial and final layers are similar, but intermediate layers are different
- stacked LSTM also converges bottom up in PennTreeBank task

Personal Thoughts

Great method for visualizing and comparing neural reprensetation
It can be applied to RNN, LSTM and even to Transformer
Want to implement, and do same comparison for N2MT
- google/svcca github

Link : https://arxiv.org/pdf/1706.05806.pdf Authors : Raghu et al. 2017

@kweonwooj

Hi Kweon Woo,

This paper "SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability" explains how we can compare two representations in a way that is both invariant to affine transform and fast to compute with combining SVD+CCA analysis methods. Paper explains nicely the steps involved for comparing two representation on page3 and in Appendix.

[SVCCA] (https://arxiv.org/pdf/1706.05806.pdf)

I'm not able to understand the first plotted figure(figure1 from paper) on toy regression data. In description, it is written that "x-axis" along over the dataset. But, what goes for "y-axis"? e.g in first plot it says "Neurons with highest activation", I tried to replicate. But, not able to plot similarly. My best guess was it's plotted one neuron with activation over the dataset.

Can you please explain what is "y-axis" in this figure1 for all plots? Thank you!!

kweonwooj / papers