subsystem3 / project

Brand Sentiment Analysis
https://github.com/subsystem3/project
0 stars 0 forks source link

Added preprocessing functions #65

Closed julietlawton closed 1 year ago

julietlawton commented 1 year ago

Added a method for performing the following preprocessing steps on the data:

  1. Stripping URLs and @ mentions from tweets
  2. Tokenize the words in a tweet using a twitter-specific tokenizer
  3. Remove stopwords and punctuation
  4. Perform stemming on the tokens