issues
search
kdassharma
/
comp551_applied_machine_learning
0
stars
0
forks
source link
News Data Set Preprocessing
#2
Closed
kdassharma
closed
4 years ago
kdassharma
commented
4 years ago
Make preliminary data frame and compute some basic statistics
kdassharma
commented
4 years ago
Transform data into something python can work with
Need to add "stemming": Get rid of all endings with are irrelevant for information "ings", "s" etc.
Get rid of non-english words, characters, symbols, colloquialisms general non-sense in th data set etc.
Need to remove all "stopping words" which are irrelevant to the count vectoriser
Look up some more NLP text classification tricks for better results (consult Ali, Colin)
Make preliminary data frame and compute some basic statistics