yao8839836 / kg-bert

KG-BERT: BERT for Knowledge Graph Completion
Apache License 2.0
679 stars 141 forks source link

non-ascii characters throw errors in FB13 #2

Open talkhaldi opened 4 years ago

talkhaldi commented 4 years ago

Hi,

Thank you for your work and publishing the code. I'm trying to run triple_classification example for FB13 as in the readme file, but I'm getting the following error:

Traceback (most recent call last): File "run_bert_triple_classifier.py", line 847, in <module> main() File "run_bert_triple_classifier.py", line 556, in main train_examples = processor.get_train_examples(args.data_dir) File "run_bert_triple_classifier.py", line 120, in get_train_examples self._read_tsv(os.path.join(data_dir, "train.tsv")), "train", data_dir) File "run_bert_triple_classifier.py", line 173, in _create_examples ent_lines = f.readlines() File "/mnt/orange/ubrew/data/opt/python/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 27: ordinal not in range(128)

Is this behavior expectable? I can change the open command to have encoding="utf-8" argument, but it becomes extremely slow. How did you deal with this issue?

yao8839836 commented 4 years ago

@talkhaldi

Hi, I didn't see this problem with my Python 3.5 or 3.6 enviorement.

You may want to try this:

import sys reload(sys) sys.setdefaultencoding('utf-8')

Zhizhizhi997 commented 3 years ago

@yao8839836 I also meet with this problem. But I think the method you provide only works for python2 instead of python3. I change

with open(os.path.join(data_dir, "entity2text.txt"), 'r') to with open(os.path.join(data_dir, "entity2text.txt"), 'r', encoding="utf")

It seems to work