Open Bachstelze opened 2 years ago
There are no Lemmas in the training data. So there can't be lemmatizer?! Can't i use the the other parts of the pipeline? When i run
from trankit import Pipeline
p = Pipeline(lang='customized', cache_dir='./save_dir')
the following error occurs:
BadZipFile: File is not a zip file
I get the same error when trying to train the lemmatizer:
Setting up training config...
Initialized lemmatizer trainer
Training dictionary-based lemmatizer
Traceback (most recent call last):
File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/custom_train00.py", line 15, in <module>
trainer.train()
File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/trankit/tpipeline.py", line 683, in train
self._train_lemma()
File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/trankit/tpipeline.py", line 584, in _train_lemma
self._lemma_model.train()
File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/trankit/models/lemma_model.py", line 381, in train
[[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if
File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/trankit/models/lemma_model.py", line 381, in <listcomp>
[[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if
KeyError: 'lemma'
Following the code from https://trankit.readthedocs.io/en/latest/training.html#training-a-lemmatizer i get a KeyError: 'lemma':
The recent version from https://github.com/UniversalDependencies/UD_Thai-PUD is used as trainings and development data.