ybisk / charNMT-noise

Scripts and noise data for Belinkov & Bisk 2018
29 stars 8 forks source link

Undersanding the code #3

Open keloemma opened 2 years ago

keloemma commented 2 years ago

Hello,

I am trying to use your code for generating sentences for a corpus and I would like to know, how it works in your code.

How many sentences are generate per method ? if we choose one method, is it just one sentence ? or multiple sentences? How many words are changed per method ? just one word or more than one word ?

What the meaning of probability ? and distribution ? I read your article and code but I could not understand , do distribution refers to the percentage of words on which the method is applied ?

thanks.

boknilev commented 2 years ago

Hi there, For each original sentence, there will be one noisy sentence. The percentage perturbed is something we vary, and I think it's at the level of sentences iirc. @ybisk ?

ybisk commented 2 years ago

Hi, does this help https://github.com/ybisk/charNMT-noise/issues/2 regarding your final question?