'utf-8' codec can't encode characters in position 1-2: surrogates not allowed

AnitaSherry commented 1 year ago

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int8/2cc27c946766f6ebd1504001b5d776a27c4d0d0b/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int8/2cc27c946766f6ebd1504001b5d776a27c4d0d0b/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int8/2cc27c946766f6ebd1504001b5d776a27c4d0d0b/quantization_kernels_parallel.so
Load kernel : /root/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int8/2cc27c946766f6ebd1504001b5d776a27c4d0d0b/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 6
Using quantization cache
Applying quantization to glm layers
2023-04-20 17:41:52,801 [WARNING] [SentenceTransformer.py:805] No sentence-transformers model found with name /root/.cache/torch/sentence_transformers/GanymedeNil_text2vec-large-chinese. Creating a new one with MEAN pooling.
Traceback (most recent call last):
  File "/home/heyiheng/work/Chinese-LangChain/main.py", line 27, in <module>
    application.source_service.init_source_vector()
  File "/home/heyiheng/work/Chinese-LangChain/clc/source_service.py", line 37, in init_source_vector
    print(doc)
UnicodeEncodeError: `'utf-8' codec can't encode characters in position 1-2: surrogates not allowed`

一个我无法解决的报错，有空的话请帮忙看一下

AnitaSherry commented 1 year ago

Traceback (most recent call last):
  File "/home/heyiheng/work/Chinese-LangChain/main.py", line 27, in <module>
    application.source_service.init_source_vector()
  File "/home/heyiheng/work/Chinese-LangChain/clc/source_service.py", line 37, in init_source_vector
    print(doc.encode('utf-8'))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1-2: surrogates not allowed

print(doc)改为print(doc.encode('utf-8'))依然无效

AnitaSherry commented 1 year ago

删掉就好了

yanqiangmiffy / Chinese-LangChain

'utf-8' codec can't encode characters in position 1-2: surrogates not allowed #7