mymusise / ChatGLM-Tuning

基于ChatGLM-6B + LoRA的Fintune方案
MIT License
3.71k stars 444 forks source link

关于jsonl打开是乱码 #192

Open nuoma opened 1 year ago

nuoma commented 1 year ago

你好,json转化成jsonl以后,中文是乱码(类似\u5047\u8bbe\u4f60\u662f),麻烦可以解释一下为什么吗?谢谢

lueeying commented 1 year ago

f.write(json.dumps(format_example(example), ensure_ascii=False) + '\n') 里加上ensure_ascii=False

ccclucky commented 1 year ago

with open(args.data_path, encoding='utf-8') as f: examples = json.load(f)

with open(args.save_path, 'w', encoding='utf-8') as f:
    for example in tqdm(examples, desc="formatting.."):
        f.write(json.dumps(format_example(example), ensure_ascii=False) + '\n')