zhangmeishan / BiaffineDParser

BiAffine Dependency Parsing
53 stars 16 forks source link

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 44: character maps to <undefined> #1

Open HassanNaeemjutt opened 4 years ago

HassanNaeemjutt commented 4 years ago

I want to use my on pretraind embedding file and and for train,dev,test data I use universal dependence data but I face this problem
(finalproject) C:\Users\Hassan\Downloads\BiaffineDParser-master(2)\BiaffineDParser-master>python driver/TrainTest.py --config_file config.ctb51.cfg GPU available: False CuDNN: True Loaded config file sucessfully. pretrained_embeddings_file experiments/ctb51/zhwiki_20180420_100d.pkl data_dir experiments/ctb51 train_file experiments/ctb51/zh-ud-train.conllu dev_file experiments/ctb51/zh-ud-dev.conllu test_file experiments/ctb51/zh-ud-test.conllu min_occur_count 2 save_dir experiments/ctb51_model config_file experiments/ctb51_model/config.cfg save_model_path experiments/ctb51_model/model save_vocab_path experiments/ctb51_model/vocab load_dir ../ckpt/default load_model_path ../ckpt/default/model load_vocab_path ../ckpt/default/vocab lstm_layers 3 word_dims 100 tag_dims 100 dropout_emb 0.33 lstm_hiddens 400 dropout_lstm_input 0.33 dropout_lstm_hidden 0.33 mlp_arc_size 500 mlp_rel_size 100 dropout_mlp 0.33 learning_rate 2e-3 decay .75 decay_steps 5000 beta_1 .9 beta_2 .9 epsilon 1e-12 clip 5.0 num_buckets_train 40 num_buckets_valid 10 num_buckets_test 10 train_iters 50000 train_batch_size 50 test_batch_size 100 validate_every 165 save_after 5000 update_every 4 Traceback (most recent call last): File "driver/TrainTest.py", line 148, in vocab = creatVocab(config.train_file, config.min_occur_count) File ".\data\Vocab.py", line 172, in creatVocab for sentence in readDepTree(infile): File ".\data\Dependency.py", line 77, in readDepTree for line in file: File "C:\Users\Hassan\Anaconda3\envs\finalproject\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 44: character maps to

zhangmeishan commented 4 years ago

utf8 without bom. Please debug to check the input format. All in the data package.