uclaradio / uclaradio.com

UCLA Radio Website
http://www.uclaradio.com
GNU Affero General Public License v3.0
31 stars 4 forks source link

Filtering with Mia's ML Script #317

Closed tanzeelak closed 5 years ago

tanzeelak commented 5 years ago

Types of changes

Purpose

We currently store both Tumblr posts from the old blog and Keystone posts from the new Admin UI. Previously, we implemented filtering the tumblr posts by parsing the tags object from Tumblr. This involved Ishaan and I manually adding a corresponding tag to each of the tumblr posts. Since we want old tumblr posts to be incorporated with the new blog's filtering feature, we didn't want to manually add tags to the existing tumblr posts (734 posts). Instead, Mia used machine learning to sort the existing tumblr posts into categories.

Approach

Mia's script categorized all the existing tumblr posts into the relevant categories: "Show Reviews", "Music Reviews". She used the title and content to sort. To create the training set, she used the titles. Once she does that, she uses the training sets content and assesses similarities to the unlabeled articles content. This label is attached to original JSON post object as topic. She outputs a categorized blogposts object.

Existing tumblr posts were fetched and stored in the mongo database via a script run by yarn fill-blog-db. This is no longer necessary, as we will now be importing Mia's categorized blogposts object into mongo.

I refactored tagging on the frontend for both Keystone and Tumblr. We used to filter based on the last value of the tags array in the post object. Now, we extract from topic in the post object. Keystone's post object previously also used a tags array. I updated the deployed version of Keystone post object to include a category object. Tumblr post's topic and Keystone post's category are treated the same. They are converted into filter names on both BlogPage.js and BlogPostPage.js, where they are eventually displayed. I had to update the existing filters to include all the new filters.

I also included Arjun's fix from #316. His works for the old filter names, but the new filter names have changed significantly enough that I included them here. "Previously, when selecting filters, dismissing the filter box would cause the checked boxes to reset to their default, unchecked state (the filters themselves were still active, of course). This was remedied with the addition of a checked attribute to the Input tags for the checkboxes, calling containsFilter to determine whether a filter is active or not."

Testing

  1. Delete existing mongo database In one terminal, start mongod: mongod In a second terminal, enter the mongo shell: mongo Choose the uclaradio database: use uclaradio Drop the blogposts collection: db.blogposts.drop() Ctrl+C out of the mongo shell.
  2. Import the newly generated posts that Mia has provided: Download categorized-blogposts.json from Google Drive Move this file to the uclaradio repo directory. Import the json file into mongo's blogposts collection. mongoimport --jsonArray --db uclaradio --collection blogposts --file categorized-blogposts.json
  3. Run the website: yarn dev. Navigate to localhost:3000/blog. You should see that most of the posts have categories.
  4. Press "Categories" bar and select "Show Reviews" , "Interviews", and "Sports".
  5. Click on any of the posts to check that the category is consistent.

Screenshot(s)

Blog Page

screen shot 2019-03-04 at 10 53 43 pm

Blog Post Page

screen shot 2019-03-04 at 11 11 41 pm

Checklist

Link to Issue

#289 #315