smallporridge / Socialformer

The implement of Socialformer
MIT License
7 stars 2 forks source link

psg tokenizer问题 #5

Open tangjiawei777 opened 2 years ago

tangjiawei777 commented 2 years ago

我将ms marco document ranking数据集处理成训练集和测试集之后,我发现psg的长度远超512,在这样的情况下使用 from transformers import BertTokenizer好像无法对psg进行BertTokenizer吧?

smallporridge commented 2 years ago

Have a try.

tangjiawei777 commented 2 years ago

你好,当psg的长度超过512的时候,是否需要对其截断处理,然后再提取token呀?

smallporridge commented 2 years ago

不需要