The length of tokens in data-clipped

yuhui-zh15 / FactCCX

BSD 3-Clause "New" or "Revised" License

7 stars 2 forks source link

The length of tokens in data-clipped #3

Open kumori123 opened 2 years ago

kumori123 commented 2 years ago

Hi, I'm now trying to use training data in generated_data/data-clipped for fine-tuning. I'm now using Roberta, but I found that there are still many sentence pairs over the limit of 512. Now I process the sentence pair like this: ~~text~~claim, maybe I'm wrong or something? Thank you in advance!

kumori123 commented 2 years ago

sry, I mean <s>text</s></s>claim</s>