Closed brunoedcf closed 1 year ago
Hi, the issue is with not setting ensure_ascii=False
in json.dumps
when you try to print the whole example.
If you do print(dataset[0].text_a)
, you will see it is actually stored properly.
We will add ensure_ascii=False
to fix the problem though.
I am passing text with special characters (utf-8) in the InputExample class and it is not storing it correctly.
print(my_dataset[0][1])
"EXTRATO DO CONTRATO DE PRESTAÇÃO DE SERVIÇOS Nº 25/2021 Processo: 00094-00005321/2019-8"for i, element in enumerate(my_dataset): dataset.append( InputExample( guid = i, text_a = element[1] ) )
then when I print dataset[0] it outputs:
"text_a": "EXTRATO DO CONTRATO DE PRESTA\u00c7\u00c3O DE SERVI\u00c7OS N\u00ba 25/2021 Processo: 00094-00005321/2019-8"