Closed joelburget closed 7 months ago
The way to decide which tokens to extract activations over is necessarily sensitive to the model/tokenizer class. The attention mask is being used here to filter out padding tokens.
I'm guessing that the way things work in your setting is that there are no padding tokens, and you want the activations over every token. If so, then the way to handle this will be to flatten over the batch and sequence dimensions. I'll implement that once you confirm.
@joelburget could you give quick feedback on @saprmarks comment above?
Apologies for the slow response -- I've been busy and have kind of stopped working on my project -- I'd say don't worry about it. I'll close this and the other issue!
https://github.com/saprmarks/dictionary_learning/commit/b2bd5f09cc7f657b7121f0659514d81336903bba added a requirement to have
attention_mask
(in the dict returned from the tokenizer, if I understand correctly). My tokenizer / model doesn't have this. Should it be necessary?