shibing624 / text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
https://pypi.org/project/text2vec/
Apache License 2.0
4.39k stars 392 forks source link

embedding长度限制是多少 #105

Closed sdhjl2000 closed 1 year ago

sdhjl2000 commented 1 year ago

text-embedding-ada-002对应的最长token是2048,text2vec的限制是多少超过会被截断?

sdhjl2000 commented 1 year ago

https://huggingface.co/shibing624/text2vec-base-chinese 底下的说明说是max_sql_length是128,这个128代表是汉字个数还是有类似tiktoken这种工具来检测

shibing624 commented 1 year ago

token数

HaoRenkk123 commented 12 months ago

请问有token maxlen为2048的模型推荐吗,或者1024的