sebastianbarfort / sds

Social Data Science, course at University of Copenhagen
http://sebastianbarfort.github.io/sds/
12 stars 17 forks source link

Project description - Group 9 #73

Closed RasmusGars closed 8 years ago

RasmusGars commented 9 years ago

Project Description

Idea We read an article about the relationship between book ratings and the rating of their movie counterparts on Vocativ: http://www.vocativ.com/news/245040/the-book-is-better-than-the-movie The article concluded that books in general were rated better and therefore superior to the average movie adaptation. Following this idea we want to do our own examination of the relationship between ratings of books and ratings of movies adapted from said books. We want to include more variables than just the ratings however since we believe this relationship to be influenced by many other things. Also we did not agree with the direct comparison of book rating to movie rating since the book ratings in the article did not use the entire scale (e.g from 1-5 almost all books scored at least 3), and the movies were more prone to use their 1-10 scale. So our main question is:

Data We want to scrape data using R from the two sources used in the article on Vocativ:

From Goodreads we choose from the list “I Saw the Movie & Read the Book” and want to scrape for each book:

Analysis At first we want to look at the movie data and book data separately, e.g. what are the average ratings distributed on different kinds of variables, e.g. number of ratings, year of release and so on. Then we want to replicate the simple plot of the average ratings of a movie along the average rating of the book it is based on like on Vocativ. Following this we will discuss different ways to do a meaningful comparison of ratings that’s not just a 1:1 using the scales on each webpage. Hereafter we want to do some simple linear regression analysis to find possible trends in book-to-movie ratings when controlling for other variables, and see how this affects the conclusion that books supposedly are better.