melisMirza / Covitter

Repository of SWE 573 project
0 stars 0 forks source link

Decide on Data Cleaning Strategy #17

Closed melisMirza closed 3 years ago

melisMirza commented 3 years ago

Evaluate the data cleaning methods. Decide which ones to include. Decide on the order of implementation

melisMirza commented 3 years ago

necessary steps for cleaning are, in order:

  1. all lower case
  2. remove url's
  3. remove punctuation (check compliance with emoji meanings)
  4. convert emoji/emoticons
  5. remove stop words
  6. chat words
  7. spell checker (maybe removed)
  8. lemmatization