thammegowda / mtdata

A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147 stars 22 forks source link

Better error messages for better UX #162

Open SamuelLarkin opened 3 months ago

SamuelLarkin commented 3 months ago

Hi, I ran mtdata get --langs fra-eng --train Statmt-news_commentary-18.1-por --no-merge --compress --out delme and I got the following output:

2024-07-29 08:50:03 main.get_data:43 INFO:: Args are ignored: {'verbose': False, 'log_level': 'INFO', 'reindex': False, 'progressbar': True, 'task': 'get', 'test_dids': None, 'dev_dids': None}
Traceback (most recent call last):
  File "/gpfs/projects/DT/mtp/corpora/MSLC24/venv/bin/mtdata", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/gpfs/projects/DT/mtp/corpora/MSLC24/venv/lib/python3.11/site-packages/mtdata/__main__.py", line 9, in main
    main.main()
  File "/gpfs/projects/DT/mtp/corpora/MSLC24/venv/lib/python3.11/site-packages/mtdata/main.py", line 324, in main
    get_data(**vars(args))
  File "/gpfs/projects/DT/mtp/corpora/MSLC24/venv/lib/python3.11/site-packages/mtdata/main.py", line 46, in get_data
    dataset = Dataset.prepare(
              ^^^^^^^^^^^^^^^^
  File "/gpfs/projects/DT/mtp/corpora/MSLC24/venv/lib/python3.11/site-packages/mtdata/data.py", line 119, in prepare
    raise Exception(f'Given languages: {langs} and dataset: {ent.did} are not compatible')
Exception: Given languages: (BCP47(fra), BCP47(eng)) and dataset: Statmt-news_commentary-18.1-por are not compatible

I would like to suggest to catch that exception and only print a clean error message, one without the stack trace as it isn't useful for a user. I think it would help the user experience.