2X GPT-like Augmentation

We will use swa, yor, hau, fon, wolof, and x for our experiments All sentences should be randomly sampled from the training set and shuffling should be done before fine-tuning

[ ] Run your augmentation to generate 2x of random samples from the training data
[ ] Create a folder in the data/ folder with the name of your augmentation strategy and put the files there. The format should be in line json, as seen in the Mafand dataset.
[ ] Fintetune NLLB for 5 epochs and report results (training samples + augmentation samples)

owos / afri_augs

2X GPT-like Augmentation #16