kanishkamisra / minicons

Utility for behavioral and representational analyses of Language Models
https://minicons.kanishka.website
MIT License
122 stars 29 forks source link

GPT2 minicons surprisal: IndexError: index out of range in self #50

Closed joyce9936 closed 9 months ago

joyce9936 commented 9 months ago

I am trying to calculate the surprisal value by feeding in a txt file with about 5000 sentences. But there is an error message I encounter: IndexError: index out of range in self Can anyone help with this issue?

Here is the code:

Screenshot 2024-02-01 at 7 09 03 AM

Here is the error message:

Screenshot 2024-02-01 at 7 08 39 AM

Expected behavior: I would like to have the surprisal value for each word for the whole text file.

Thank you!

kanishkamisra commented 9 months ago

GPT2 has a fixed context limit of 1024 tokens, so if you pass in text containing 5000 sentences in one go it will error out. I recommend you read the file line-by-line and then pass the sentences in batches. You can find an example here

joyce9936 commented 9 months ago

GPT2 has a fixed context limit of 1024 tokens, so if you pass in text containing 5000 sentences in one go it will error out. I recommend you read the file line-by-line and then pass the sentences in batches. You can find an example here

Thank you!