jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

tag, branch and/or release code for reproducibility #108

Open jowagner opened 2 years ago

jowagner commented 2 years ago

While people who want to replicate our paper can check out code based on a commit number found by inspecting the commit history, they are at risk to pick

We should tag, branch and/or release code to make it easy for visitors to pick the right code for reproducibility.

A branch would make it possible to keep updating the README (and to make late additions of code used in the experiments) even after the main branch diverges, e.g. when the main branch changes the steps and/or tools to carry out the experiment. This branch could be named "bert-base-irish-cased-v1", matching the model name in the huggingface model repository.

We also need to document commit number / version of wiki-bert-pipeline and opusfilter. (The idea of using a fork in your own github account only works as long as you remember to never hit the "fetch upstream" button or to make any other changes to your fork.)

jbrry commented 2 years ago

Thanks - it might be safe to do a release before any new code is added, and we can also make a branch as well just in case we want to update anything that works with the old functionality. Is there any specific commit we should release/branch from? Looking at the recent commits, most commits since 2022 seem to be fairly cosmetic and shouldn't change things too much. In that case, should we make our release/branch from the most recent commit?

Good advice about keeping track of the relevant commits used for the external libraries, opusfilter and wikibert-pipeline. I will do that too.

jowagner commented 2 years ago

I see no functional changes this year in https://github.com/jbrry/Irish-BERT/compare/9b45a4d8189376752e264ecd57a0baf66abed696...master

Branching from head should be ok and is easiest. If you prefer to branch from an earlier commit you probably will want to cherry pick all commits updating the readme.

jbrry commented 2 years ago

Thanks, I agree.

I made releases from the relevant branches of our forks of:

I updated the README 5be64573a63601fe775c88ed8c9e0454e8f6dbe9 with instructions to download these releases specifically, so users will have a snapshot of these external libraries that won't be affected by upstream merges.

These releases/dependencies form the basis of the v0.1.0 release of Irish-BERT: https://github.com/jbrry/Irish-BERT/releases/tag/v0.1.0.