2021-04-23 23:06:33,825 (word_embedding.py:111): [INFO] - Reading embeddings from file
2021-04-23 23:06:33,825 (word_embedding.py:119): [INFO] - Found line with wrong number of dimensions (expected 300, was 2): b'2000000 300\n'
Traceback (most recent call last):
File "/home/yyoo/src_remote/claf/train.py", line 10, in <module>
experiment()
File "/home/yyoo/src_remote/claf/claf/learn/experiment.py", line 134, in __call__
train_loader, valid_loader, optimizer = self.set_train_mode()
File "/home/yyoo/src_remote/claf/claf/learn/experiment.py", line 220, in set_train_mode
model = self._create_model(token_makers, helpers=helpers)
File "/home/yyoo/src_remote/claf/claf/learn/experiment.py", line 299, in _create_model
ModelFactory, self.config.model, param=model_params
File "/home/yyoo/src_remote/claf/claf/learn/experiment.py", line 245, in _create_by_factory
return factory_obj.create(item_config, **param)
File "/home/yyoo/src_remote/claf/claf/factory/model.py", line 36, in create
token_embedder = self.create_token_embedder(model, token_makers)
File "/home/yyoo/src_remote/claf/claf/factory/model.py", line 52, in create_token_embedder
return token_embedder.RCTokenEmbedder(token_makers)
File "/home/yyoo/src_remote/claf/claf/tokens/token_embedder/reading_comprehension_embedder.py", line 27, in __init__
super(RCTokenEmbedder, self).__init__(token_makers)
File "/home/yyoo/src_remote/claf/claf/tokens/token_embedder/base.py", line 24, in __init__
self.add_embedding_modules(token_makers)
File "/home/yyoo/src_remote/claf/claf/tokens/token_embedder/base.py", line 33, in add_embedding_modules
embedding = token_maker.embedding_fn(vocab)
File "/home/yyoo/src_remote/claf/claf/tokens/__init__.py", line 12, in wrapper
return module(**embedding_config)
File "/home/yyoo/src_remote/claf/claf/tokens/embedding/word_embedding.py", line 63, in __init__
weight = self._read_pretrained_file(pretrained_path)
File "/home/yyoo/src_remote/claf/claf/tokens/embedding/word_embedding.py", line 115, in _read_pretrained_file
fields = line.decode("utf-8").rstrip().split(" ")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte