openai / deeptype

Code for the paper "DeepType: Multilingual Entity Linking by Neural Type System Evolution"
https://arxiv.org/abs/1802.01021
Other
647 stars 147 forks source link

UnicodeEncodeError #39

Closed mshahriarinia closed 6 years ago

mshahriarinia commented 6 years ago

Build fails ./extraction/full_preprocess.sh ${DATA_DIR} en:

raceback (most recent call last):
  File "extraction/get_wikiname_to_wikidata.py", line 347, in <module>
    main()
  File "extraction/get_wikiname_to_wikidata.py", line 287, in main
    missing_wikidata_important_properties_fnames
  File "extraction/get_wikiname_to_wikidata.py", line 119, in get_wikidata_mapping
    fout_name2id.write(key + "/" + value["title"] + "\t" + str(index) + "\n")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-14: ordinal not in range(128)
mshahriarinia commented 6 years ago

The solution was modifying fout_name2id.write(key + "/" + value["title"] + "\t" + str(index) + "\n") to fout_name2id.write(key + "/" + value["title"] + "\t" + index.encode('utf-8') + "\n")

heisenbugfix commented 6 years ago

If you are using linux, change your locale. That helped me without changing any code.