language-brainscore / langbrainscore

[Marked for Deprecation. please visit https://github.com/brain-score/language for the migrated project] Benchmarking of Language Models using Human Neural and Behavioral experiment data
https://language-brainscore.github.io/langbrainscore/
MIT License
4 stars 1 forks source link

Currently, Bidirectional encoding makes it impossible to extract repr of a specific stimulus #8

Closed aalok-sathe closed 2 years ago

aalok-sathe commented 2 years ago

https://github.com/language-brainscore/langbrainscore/blob/a2ad6bd81ac08350aaf2e21488290522ede57dd4/langbrainscore/encoder/ann.py#L193-L194

We rely on 'tokenized length' to detect boundaries between previous and current stimulus. However, this assumption works when we sequentially add a stimulus to the context group as in the unidirectional case. In the bidirectional case, the entire context group is used, so the tokenized length is always the total length.