Open jbloomAus opened 1 year ago
Would storing/calculating mean kurtosis of activations be interesting? https://transformer-circuits.pub/2023/privileged-basis/index.html
On a wim I added basic history visualization. Main issues are:
I also started time embedding dot product viz but didn't finish but I'll leave it there. It didn't seem super interesting.
Plot L2 norm of residual streams (gives sense for amount of info in a layer as compared to the amount of info going into the logit).
Analysis features
Static
Composition
Dynamic
Logit Lens
Attention Maps:
Causal
Activation Patching (features)
RTG Scan
Congruence -> If features aren't in superposition, what effect do they have on the predictions?
Renew old features:
SVD Decomp / Explore ways to use dimensionality reduction to quickly understand what heads are doing.
Cache Characterization?
Advanced
Implement Path Patching
Implement AVEC
Several things I feel are missing which are required for exploratory analysis to be more complete:
Several things I feel will be required for falsifying predictions of how the model is working: