This process leads to V logit differences per ablation and affected token, where V is the size of the token vocabulary. Because a constant difference at every logit does not affect the post-softmax probabilities, we subtract at each token the median logit difference value. Finally, we concatenate these vectors together across some set of T future tokens (at the ablated index or later) to obtain a vector of V · T total numbers.
Since it's V logit diff per ablation and token, shouldn't the aggregated tensor be (V, n_ablation, T), rather than (V, T)?
Is there an implicit aggregation, mean/max? Thx!
In section 4.5:
Since it's V logit diff per ablation and token, shouldn't the aggregated tensor be
(V, n_ablation, T)
, rather than(V, T)
? Is there an implicit aggregation, mean/max? Thx!