nicholasturner1 commented 2 years ago

TO DO:

Debug and finish logit attribution function
Functionally analyze the heads that receive many of the top composition weights in Neo125M
Generalize parameter matrix functions to directly work on different devices (e.g., GPUs)
Finish running composition values for larger models on larger machines
Functionally analyze as many interesting heads as possible before conference

OUTLINE

Figure 1: Contributions to the residual stream and the datasets

Architecture schematic with the residual stream
Subspace vector diagram (contributions to the stream add within any given unit subspace)
Table of the number of input/output pairs of different types across models
Basic value distributions

Figure: Breakdown by type (QKV) and layer

Percentile vs type histogram (w/ and w/o baselines)
Percentile vs layer histogram (<- this needs to be normalized)
Where do the top values point?
Biases vs. att head layer distances

Figure: Can we extract interesting functional properties?

Can we extract induction heads?
What do the heads with many top values do?

Figure: Higher order terms

Input path complexity cartoon (w/ and w/o baselines)
Input path complexity plots vs. random

Supplementary Figures

Basic value distributions with orig denominator
Head-by-head term value plots (one w/ high values and one without)
Scatterplot of old denominator vs. new denominator

NICE TO HAVE

Measuring composition with the embedding & unembedding weights
Think about fusing LN weights with input/output matrices faster to speed things up
Include MLPs in path analysis

FUTURE WORK:

See if paths functionally maintain signals by injecting noise at an early point, and measuring downstream effects (future work)
Look at the maximum value of the reverse edges and see where it pops up
Measure network performance before & after knocking out low composition reads and high composition reads
More meta: do a deeper dive into orthogonal vectors, basis, and subspaces. Investigative work on what mental tools might be useful. Almost orthogonal vectors, etc.
Reserve some portion of the residual stream for embeddings, and then measure how "inflated" the remaining subspaces of the attention head are
Major singular values and where they point
More work on baselines
Rank of input and output weights (by head and layer)
Think about the baseline more

Figure: The baseline

Computing reverse edges cartoon
"95% confidence" and "non-random" thresholds
num sent edges heatmap (by attention head)
num received edges heatmap (by attention head)

Figure: Individual singular values

Do large bandwidth terms mean one large value? Or several?

nicholasturner1 commented 2 years ago

Updated Figure 5 to work with IPC

nicholasturner1 commented 2 years ago

Shuffled a bunch of things around after the Eleuther talk, including designating material for follow-up papers.

nicholasturner1 commented 2 years ago

Action items from meeting with Neel Nanda

IMPORTANT

Check whether we can find induction heads using our values
Functionally analyze the heads that receive many of the top composition weights (particularly V composition)

NICE TO HAVE

Measuring composition with the embedding & unembedding weights
Scatterplot of old denominator vs. new denominator
Think about fusing LN weights with input/output matrices faster to speed things up
Include MLPs in path analysis

FUTURE WORK

See if paths functionally maintain signals by injecting noise at an early point, and measuring downstream effects (future work)
Look at the maximum value of the reverse edges and see where it pops up
Measure network performance before & after knocking out low composition reads and high composition reads
More meta: do a deeper dive into orthogonal vectors, basis, and subspaces. Investigative work on what tools might be useful.
Reserve some portion of the residual stream for embeddings, and then measure how "inflated" the remaining subspaces of the attention head are

nicholasturner1 / gpt-omics

Initial paper outline #11

TO DO:

OUTLINE

Figure 1: Contributions to the residual stream and the datasets

Figure: Breakdown by type (QKV) and layer

Figure: Can we extract interesting functional properties?

Figure: Higher order terms

Supplementary Figures

NICE TO HAVE

FUTURE WORK:

Figure: The baseline

Figure: Individual singular values

IMPORTANT

NICE TO HAVE

FUTURE WORK