rungjoo / CoMPM

Context Modeling with Speaker's Pre-trained Memory Tracking for Emotion Recognition in Conversation (NAACL 2022)
62 stars 14 forks source link

RoBERTa large max input token size issue #9

Closed jinmyeongAN closed 1 year ago

jinmyeongAN commented 1 year ago

Question

What if all previous utterances is over than 512 number of tokens?

I know the max input token size of RoBERTa large is 512. [link]

And in your paper, I could find this

context embedding module(CoM) reflects all previous utterances as context.
...
We use an Transformer encoder as a context model (such as RoBERTa).

If you met that kind of problem, then did you use sliding window or something

Thank you

rungjoo commented 1 year ago

As I recall, in the IEMOCAP dataset, there are cases that exceed the max length, so old tokens that make it over the max length will be discarded.

Please refer to the following code. https://github.com/rungjoo/CoMPM/blob/master/utils.py#L76 https://github.com/rungjoo/CoMPM/blob/master/utils.py#L16

jinmyeongAN commented 1 year ago

I understood! Thank you for your answering :)