jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

Release of the Cloze Test set? #125

Closed KhanhTungTran closed 2 months ago

KhanhTungTran commented 4 months ago

Hello,

Thank you for the great work!

I am a PhD student in AI at University College Cork, and I am interested in training large-scale language models for Irish.

I am wondering if you can release the Cloze Test set used in your paper? It will be a great resource for evaluation of Irish-based language models.

Thank you.

jbrry commented 4 months ago

Hi there, thanks for checking out the paper and the nice words!

@laurenCassidy do you still have the corpus/scripts used in the evaluation?

It may also be worth looking at this script to inspect various LMs on masked token prediction.

laurenCassidy commented 4 months ago

Hi @KhanhTungTran, Here is the corpus used for the cloze test: cloze.csv

laurenCassidy commented 4 months ago

@jbrry will I add the cloze test script to the repository or share it another way? GitHub doesn't allow me to attact .py file to comment

KhanhTungTran commented 4 months ago

Thank you for the corpus file!

jbrry commented 4 months ago

will I add the cloze test script to the repository or share it another way? GitHub doesn't allow me to attact .py file to comment

Thanks @laurenCassidy, if you could add it to the scripts directory that would be great!

laurenCassidy commented 4 months ago

I have added the cloze.py script to the scripts directory