gw-sd-2016 / NewsTextAnalysis

Ellen Louie's senior design project
0 stars 0 forks source link

Week 24: Machine learning and prepping for donation feature #9

Open ealouie opened 8 years ago

ealouie commented 8 years ago

@poorvi-vora

This week I tried a new method of trying to improve my machine learning results that Professor Simha had mentioned. It was a variation on attribute selection which involved taking around 10 articles that I selected that are related to women, extracting out the stop words, and then putting what is left as an instance in my training file. While it yielded more results for the machine learning, they weren’t necessarily more accurate results, so I removed that instance from my training set.

I also started preparing for my donation feature. First, I added a UI for it (37d9f3100b63049a588403617afefb17c2314cc5). Then I decided on my first round of class attributes for the machine learning that will decide the theme of an article:

reproductive health-abortion, contraception women's health women's rights-equal pay, paid family leave education violence against women international culture/arts

Each of these categories will have a general class attribute along with some more specific ones that I listed. This part was difficult because I really want to produce the most accurate list of nonprofits for the user, but that would involve getting very, very specific about the themes (which are the class attributes). So for now I left it pretty general with the option of getting more specific later.

This coming week I'll continue to try and improve my machine learning by trying out ensemble methods. Then I'll continue on the donation feature by creating a new training set, testing out algorithms in Weka explorer, and then starting (and hopefully finishing) integrating that into my project using the Weka API.

poorvi-vora commented 8 years ago

Can you have your app look for common words between the articles and common foundations/activist sites/petitions? (Rather than you first manually creating a list of attributes? Or some other way of automating the funding search which might work for other types of interests? Something to think about, you needn't do it right away).

Sent from my iPhone

On Feb 14, 2016, at 3:59 PM, ealouie notifications@github.com wrote:

@poorvi-vora

This week I tried a new method of trying to improve my machine learning results that Professor Simha had mentioned. It was a variation on attribute selection which involved taking around 10 articles that I selected that are related to women, extracting out the stop words, and then putting what is left as an instance in my training file. While it yielded more results for the machine learning, they weren’t necessarily more accurate results, so I removed that instance from my training set.

I also started preparing for my donation feature. First, I added a UI for it (37d9f31). Then I decided on my first round of class attributes for the machine learning that will decide the theme of an article:

reproductive health-abortion, contraception women's health women's rights-equal pay, paid family leave education violence against women international culture/arts

Each of these categories will have a general class attribute along with some more specific ones that I listed. This part was difficult because I really want to produce the most accurate list of nonprofits for the user, but that would involve getting very, very specific about the themes (which are the class attributes). So for now I left it pretty general with the option of getting more specific later.

This coming week I'll continue to try and improve my machine learning by trying out ensemble methods. Then I'll continue on the donation feature by creating a new training set, testing out algorithms in Weka explorer, and then starting (and hopefully finishing) integrating that into my project using the Weka API.

— Reply to this email directly or view it on GitHub.

ealouie commented 8 years ago

@poorvi-vora The Newspaper API that I originally was working with had keyword extraction, so I could try to run that on the nonprofit websites that will be in my database, and then add those generated keywords instead of manually tagging them. Then run the keyword extraction on the articles that are related to women and try to match the generated keywords to those in the database.