Open cdleong opened 3 years ago
One suggestion in the slack would be to break the new notebook code into two parts
One suggestion in the slack would be to break the new notebook code into two parts
* One notebook that takes in a HuggingFace dataset at the top, and proceeds from there to train a JoeyNMT model. This might make things a lot easier on people. If they can get data into the HuggingFace Dataset format, we can show them how to train. * One notebook that shows people how to do it: loads in data from various filetypes or sources (.csv, paired text files, directly from the HuggingFace hub) to HuggingFace format: https://huggingface.co/docs/datasets/loading_datasets.html
See this slack discussion: https://masakhane-nlp.slack.com/archives/C01GF5XJ0TF/p1634863777007500?thread_ts=1634844471.007300&cid=C01GF5XJ0TF
https://colab.research.google.com/drive/1RWOle7RHy_wq0uGWxmAq1ZfmEQIFsCHj#scrollTo=h1Ddy4_AOKdm could make for a starting point. This notebook shows how to download a HuggingFace dataset and write it out to files of the format JoeyNMT expects... I think
@cdleong if this is still relevant, I would like to work on it.
I think it is still relevant, yes. And I just got done with my semester so I might have more free time as well, after the holidays
On Mon, Dec 12, 2022, 1:43 PM Benjamin Beilharz @.***> wrote:
@cdleong https://github.com/cdleong if this is still relevant, I would like to work on it.
— Reply to this email directly, view it on GitHub https://github.com/masakhane-io/masakhane-mt/issues/200#issuecomment-1346174627, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7LHRL4ICNYUXJOEAVODMTWM3XSNANCNFSM5GO6SEBQ . You are receiving this because you were mentioned.Message ID: @.***>
Alright, so I have started with the notebook and will be done by the end of next week. I have to prepare for an exam next Wednesday, but I will be wrapping up the notebook.
/self-assign
Alright, so I have started with the notebook and will be done by the end of next week. I have to prepare for an exam next Wednesday, but I will be wrapping up the notebook.
/self-assign
Any update?
Slack discussion: https://masakhane-nlp.slack.com/archives/C01JAP67HRV/p1634844082006400
https://github.com/joeynmt/joeynmt/blob/master/joey_demo.ipynb is the Tatoeba example.