netease-youdao / BCEmbedding

Netease Youdao's open-source embedding and reranker models for RAG products.
Apache License 2.0
1.41k stars 93 forks source link

请问出现这种情况:Token indices sequence length is longer than the specified maximum sequence length for this model (781 > 512),如何解决? #62

Open starxuh opened 3 months ago

starxuh commented 3 months ago

请问出现下面这种情况 Token indices sequence length is longer than the specified maximum sequence length for this model (781 > 512). Running this sequence through the model will result in indexing errors You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. 如何解决?是模型token限制吗?

shenlei1020 commented 3 months ago

没关系,不影响结果,只是warning,BCEmbedding的python包已经做了处理