tattle-made / Uli

Software and Resources for Mitigating Online Gender Based Violence in India
https://uli.tattle.co.in
GNU General Public License v3.0
40 stars 29 forks source link

Can we get derivative from root words from the slur list. #505

Closed dennyabrain closed 7 months ago

dennyabrain commented 11 months ago

We are using the phrase "root word" as a catchall term. There are many words that are misspellings of a common slur. For instance fuck could be spelled as fck, fk, fcuk etc.

This was highlighted as a problem during our annotation sprints too. Annotators weren't sure if they should annotate all the misspellings or would it be ok to just annotate the root word and our system will understand that those annotations are valid for all derived words.

In scope for this issue is,

  1. evaluating methods - automatic or manual
  2. splitting our slur list into root and derived words
aatmanvaidya commented 11 months ago

Previous Work on the Umbrella Issue

Derive Root Words of the words in the Slur List

From what I broadly understand we want to do - Lemmatization and Stemming The root word is called - Lemma

Automated/ Semi-Automated Approaches

Lemmatization for English

Lemmatization for Hindi

Approaches with Human Intervention

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 7 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.