[ ] (optional) assign yourself in "Assignees" over to the right
[ ] Try running the notebooks, in Google Colab
[ ] See where they break.
[ ] Edit the notebook to swap in another dataset. Perhaps by Loading in a HuggingFace dataset, and then writing it back out into a format JoeyNMT knows how to use, creating a train.en and train.xh file maybe.
Edit: see #200, maybe we should leave the old JW300 notebooks up, and instead create new ones
The problem
JW300 has been taken down for copyright reasons. At least the following notebooks all rely on it:
https://github.com/masakhane-io/masakhane-mt/blob/master/starter_notebook_from_English_training.ipynb https://github.com/masakhane-io/masakhane-mt/blob/master/starter_notebook_gdrive_from_English.ipynb https://github.com/masakhane-io/masakhane-mt/blob/master/starter_notebook_into_English_training.ipynb
a solution (but see #200 )
They need to be fixed to no longer use this dataset. Perhaps we could use Tatoeba or FloRES 101? Or one of the other machine translation datasets on https://huggingface.co/datasets?task_ids=task_ids:machine-translation&sort=downloads