huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.49k stars 26.89k forks source link

Key Error: 'pre-processing' during conversion from tatoeba to Marian model #11647

Open velocityCavalry opened 3 years ago

velocityCavalry commented 3 years ago

Environment info

Who can help

Marian: @patrickvonplaten , @patil-suraj

Information

Model I am using (Bert, XLNet ...): Marian

The problem arises when using:

The tasks I am working on is:

To reproduce

Following the script from scripts/tatoeba/README.md

  1. cd transformers
    pip install -e .
    pip install pandas GitPython wget
  2. curl https://cdn-datasets.huggingface.co/language_codes/language-codes-3b2.csv  > language-codes-3b2.csv
    curl https://cdn-datasets.huggingface.co/language_codes/iso-639-3.csv > iso-639-3.csv
  3. git clone git@github.com:Helsinki-NLP/Tatoeba-Challenge.git
  4. python src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py --models kor-eng eng-kor --save_dir converted/

Error message:

Traceback (most recent call last):
  File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 1267, in <module>
    resolver = TatoebaConverter(save_dir=args.save_dir)
  File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 58, in __init__
    reg = self.make_tatoeba_registry()
  File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 258, in make_tatoeba_registry
    return [(k, v["pre-processing"], v["download"], v["download"][:-4] + ".test.txt") for k, v in results.items()]
  File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 258, in <listcomp>
    return [(k, v["pre-processing"], v["download"], v["download"][:-4] + ".test.txt") for k, v in results.items()]
KeyError: 'pre-processing'

Expected behavior

Conversion of the model from Tatoeba to Marian for the chosen language pair with no errors.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patil-suraj commented 3 years ago

unstale

patrickvonplaten commented 3 years ago

@patil-suraj - It would be really nice if we could tackle the tatoeba models at some point...

This seems to be related: https://github.com/huggingface/transformers/pull/12192 https://github.com/huggingface/transformers/issues/10943