fix case of special tokens in encoder

language-brainscore / langbrainscore

[Marked for Deprecation. please visit https://github.com/brain-score/language for the migrated project] Benchmarking of Language Models using Human Neural and Behavioral experiment data

https://language-brainscore.github.io/langbrainscore/

MIT License

4 stars 1 forks source link

fix case of special tokens in encoder #19

Closed benlipkin closed 2 years ago

benlipkin commented 2 years ago

special tokens, e.g. , from tokenizer cause 1-off errors when using indices to extract sentence representations from context.

aalok-sathe commented 2 years ago

Now what happens here is: the special tokens are chopped off from each stimulus when extracting stimulus-level representations evaluated within a context. The remaining thing here is: being able to extract first-token/last-token/special-token representation for a single stimulus, because now special tokens are chopped off by default since in context they represent the whole context rather than any stimulus

aalok-sathe commented 2 years ago

whoops, that was an incorrect reference to this issue. it should have been #18 instead