请问出现这种情况：Token indices sequence length is longer than the specified maximum sequence length for this model (781 > 512)，如何解决？

netease-youdao / BCEmbedding

Netease Youdao's open-source embedding and reranker models for RAG products.

Apache License 2.0

1.41k stars 93 forks source link

请问出现这种情况：Token indices sequence length is longer than the specified maximum sequence length for this model (781 > 512)，如何解决？ #62

Open starxuh opened 3 months ago

starxuh commented 3 months ago

请问出现下面这种情况 Token indices sequence length is longer than the specified maximum sequence length for this model (781 > 512). Running this sequence through the model will result in indexing errors You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. 如何解决？是模型token限制吗？

shenlei1020 commented 3 months ago

没关系，不影响结果，只是warning，BCEmbedding的python包已经做了处理