thunlp / OpenPrompt

An Open-Source Framework for Prompt-Learning.
https://thunlp.github.io/OpenPrompt/
Apache License 2.0
4.38k stars 455 forks source link

How to pass text with special special characters in the InputExample class? #258

Closed brunoedcf closed 1 year ago

brunoedcf commented 1 year ago

I am passing text with special characters (utf-8) in the InputExample class and it is not storing it correctly.

print(my_dataset[0][1]) "EXTRATO DO CONTRATO DE PRESTAÇÃO DE SERVIÇOS Nº 25/2021 Processo: 00094-00005321/2019-8"

for i, element in enumerate(my_dataset): dataset.append( InputExample( guid = i, text_a = element[1] ) )

then when I print dataset[0] it outputs:

"text_a": "EXTRATO DO CONTRATO DE PRESTA\u00c7\u00c3O DE SERVI\u00c7OS N\u00ba 25/2021 Processo: 00094-00005321/2019-8"

yulinchen99 commented 1 year ago

Hi, the issue is with not setting ensure_ascii=False in json.dumps when you try to print the whole example. If you do print(dataset[0].text_a), you will see it is actually stored properly.

We will add ensure_ascii=False to fix the problem though.