RoBERTa large max input token size issue

rungjoo / CoMPM

Context Modeling with Speaker's Pre-trained Memory Tracking for Emotion Recognition in Conversation (NAACL 2022)

62 stars 14 forks source link

Closed jinmyeongAN closed 1 year ago

jinmyeongAN commented 1 year ago

What if all previous utterances is over than 512 number of tokens?

I know the max input token size of RoBERTa large is 512. [link]

And in your paper, I could find this

context embedding module(CoM) reflects all previous utterances as context.
...
We use an Transformer encoder as a context model (such as RoBERTa).

If you met that kind of problem, then did you use sliding window or something

Thank you

rungjoo commented 1 year ago

As I recall, in the IEMOCAP dataset, there are cases that exceed the max length, so old tokens that make it over the max length will be discarded.

jinmyeongAN commented 1 year ago

I understood! Thank you for your answering :)