saprmarks / dictionary_learning

MIT License
142 stars 37 forks source link

`attention_mask` is required #8

Closed joelburget closed 7 months ago

joelburget commented 8 months ago

https://github.com/saprmarks/dictionary_learning/commit/b2bd5f09cc7f657b7121f0659514d81336903bba added a requirement to have attention_mask (in the dict returned from the tokenizer, if I understand correctly). My tokenizer / model doesn't have this. Should it be necessary?

saprmarks commented 8 months ago

The way to decide which tokens to extract activations over is necessarily sensitive to the model/tokenizer class. The attention mask is being used here to filter out padding tokens.

I'm guessing that the way things work in your setting is that there are no padding tokens, and you want the activations over every token. If so, then the way to handle this will be to flatten over the batch and sequence dimensions. I'll implement that once you confirm.

canrager commented 7 months ago

@joelburget could you give quick feedback on @saprmarks comment above?

joelburget commented 7 months ago

Apologies for the slow response -- I've been busy and have kind of stopped working on my project -- I'd say don't worry about it. I'll close this and the other issue!