Closed yidasanqian closed 2 months ago
Based on google, I think this issue is related to proxy settings: https://stackoverflow.com/questions/45573833/error-in-downloading-nltk-data-errno-11004-getaddrinfo-failed
@wxywb Where is the downloaded file located? Can I manually specify a directory?
There is due to network conditions, you can search based on your environment. https://blog.csdn.net/qq_63385279/article/details/136220118
If my corpus is a mix of Chinese and English, and I specify that the analyzer is zh
, will it fit properly?
In this scenario, using the Jieba tokenizer to break your sentences into English and Chinese tokens would result in inferior performance compared to using an English tokenizer. This is because the English tokenizer employs stemming algorithms to match different variants of a word. For better performance, consider using BGE-M3 or a customized tokenizer that applies a stemming algorithm to English words.
code:
error:
How to solve it?