gw-sd-2016 / NewsTextAnalysis

Ellen Louie's senior design project
0 stars 0 forks source link

Week 11: Started creating training set #1

Open ealouie opened 8 years ago

ealouie commented 8 years ago

@cctoombs @twood02

I've spent the last week reading up on Weka and how to format their .arff training set files. I've started to compile my training with 50 articles relating to women and 50 articles not relating to women that I've been gathering from different news sources, such as The Guardian, The New York Times, USA Today, and the Washington Post (https://github.com/gw-sd-2016/NewsTextAnalysis/commit/f80a71c6b643b63e94363998844a7280d3a43bea). To extract the plain text from the HTML, I've been using a python script that I wrote utilizing the open source article scraping library newspaper.