jasonwei20 / eda_nlp

Data augmentation for NLP, presented at EMNLP 2019
https://arxiv.org/abs/1901.11196
1.59k stars 315 forks source link

random_insertion should take stop words into account #24

Closed ghost closed 4 years ago

ghost commented 4 years ago

https://github.com/jasonwei20/eda_nlp/blob/d75e8bd4631f4d93260cb291aa47852d8eacd51d/code/eda.py#L151

in your documentation you are saying that for the "insert" you remove "stop words". in the code it does not.

I have not very often an random insert hit due to fact that possible stop words are not found in synonms. And you to take only the noStopWords into account here https://github.com/jasonwei20/eda_nlp/blob/d75e8bd4631f4d93260cb291aa47852d8eacd51d/code/eda.py#L160

ghost commented 4 years ago

your EDA is cool - I have translated it into c# and use it for NLP data augmentation for our bot

jasonwei20 commented 4 years ago

Ah yes, you're right here.