trinker / textclean

Tools for cleaning and normalizing text data
244 stars 26 forks source link

Add a slangerizer #44

Closed trinker closed 6 years ago

trinker commented 6 years ago

https://stackoverflow.com/questions/24515/bad-words-filter

trinker commented 6 years ago

Pseudocode:

  1. Find all chars that could be made to slang
  2. Map to all replacements possible
  3. Create all combos of the word that can be made using slangs and standardized letters
  4. Sprintf woth do.call to generate slanged words

In the original list ignore standard characters [a-zA-Z] as jsut different case versions? Or add cas eas an option but this gets trick and makes the problem much more difficult computationally.

Determine the name for this type of letter eplacement.

trinker commented 6 years ago

https://www.telegraph.co.uk/news/newstopics/howaboutthat/2667634/The-Clbuttic-Mistake-When-obscenity-filters-go-wrong.html

Funny...

Also so far the term is "intentional misspelling"??

trinker commented 6 years ago

https://en.wikipedia.org/wiki/Satiric_misspelling and is there a need to detect and replace these replace_satiric_misspelling????? It's close but usually done to hise a word from detection not for satire.

Also not quite:https://en.wikipedia.org/wiki/Cacography

trinker commented 6 years ago

Maybe ask here: https://english.stackexchange.com/questions/137922/is-there-a-special-word-for-purposely-misspelling-a-word

trinker commented 6 years ago

obfuscation ?? https://en.wikipedia.org/wiki/Typographical_error
yeah may be winner winner chicken dinner: https://www.independent.co.uk/news/science/rhodri-marsden-cyberclinic-6230686.html

So is leet or leet speak the answer? https://www.urbandictionary.com/define.php?term=leet%20speak

https://simple.wikipedia.org/wiki/Leet

Interesting: http://www.robertecker.com/hp/research/leet-converter.php leet convertor

trinker commented 6 years ago

http://www.gamehouse.com/blog/leet-speak-cheat-sheet/ http://1337.me/

trinker commented 6 years ago

http://md5decrypt.net/en/Leet-translator/

trinker commented 6 years ago

Mapping: http://www.gamehouse.com/blog/leet-speak-cheat-sheet/ https://qntm.org/l33t https://www.paulbui.net/wl/Leetspeak_alphabet

About: https://www.wikihow.com/Read-and-Write-in-1337

trinker commented 6 years ago

From english seems easy but to English seems difficult at best. A replace_leet would likely be computationally intensive.

The translation from leet to normal text can be complicated because as I said there's a lot of different alphabets, and no official one. That's why you won't find any tool to decode leet speak on this page, only a translator. -http://md5decrypt.net/en/Leet-translator/-

trinker commented 6 years ago

Closing because of the variability in alphabet...

The translation from leet to normal text can be complicated because as I said there's a lot of different alphabets, and no official one. That's why you won't find any tool to decode leet speak on this page, only a translator