owos / afri_augs

Data Augmentation for Generative models
1 stars 5 forks source link

Create a script to perform Sentence Concatenation #8

Closed Iambusayor closed 8 months ago

Iambusayor commented 8 months ago

Sentence concatenation, as explored in this paper (Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT), involves the random concatenation of multiple sentences with as a separator token between concatenated sentences. Here, we need a class or script that performs the above for a given language pair. More information can be found in this paper Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation

e.g., let x represent english (en) sentences and y represent the yoruba (yor) translated sentences;

x_1: how are you my son. x_2: the fuel in the lamp is about to finish.

y_1: bawo ni omo mi. y_2: epo ti o wa ninu fitila naa ti fẹrẹ pari.

concatenated sentences x: how are you my son. the fuel in the lamp is about to finish. y: the fuel in the lamp is about to finish. epo ti o wa ninu fitila naa ti fẹrẹ pari.

heisguyy commented 8 months ago

I want this task.

r-chinonyelum commented 8 months ago

can I work on this task with you?

heisguyy commented 8 months ago

I am almost done with it. It's just to implement the back translation that's left.