Idea
We read an article about the relationship between book ratings and the rating of their movie counterparts on Vocativ:
http://www.vocativ.com/news/245040/the-book-is-better-than-the-movie
The article concluded that books in general were rated better and therefore superior to the average movie adaptation. Following this idea we want to do our own examination of the relationship between ratings of books and ratings of movies adapted from said books. We want to include more variables than just the ratings however since we believe this relationship to be influenced by many other things. Also we did not agree with the direct comparison of book rating to movie rating since the book ratings in the article did not use the entire scale (e.g from 1-5 almost all books scored at least 3), and the movies were more prone to use their 1-10 scale. So our main question is:
In general do highly rated books turn into highly rated movies?
Trying to answer this we also want to look at the following:
How much do other variables influence this relationship?
How is the relationship between average ratings and the number of ratings - do only good books and movies get rated?
Data
We want to scrape data using R from the two sources used in the article on Vocativ:
IMDb, www.IMDb.com
Goodreads, www.goodreads.com
IMDb is a webpage where visitors are able to rate e.g. movies and TV-shows. It also contains a lot of background info. Goodreads works the same way except it is for books.
From IMDb we use the filter “based on novels” and want to scrape the following data for each movie:
Title
Year
Movie length
Avg. Rating
Number of ratings
Budget
Gross earnings
From Goodreads we choose from the list “I Saw the Movie & Read the Book” and want to scrape for each book:
Title
Author
Avg. Rating
Number of ratings
Combining these two datasets we hopefully get a lot of overlaps between titles which make us able to do comparisons between the books and their movie counterparts. A challenge might present itself in the way titles are structured and/or shown in different languages. Also not all types of data might be available for each movie or book.
Analysis
At first we want to look at the movie data and book data separately, e.g. what are the average ratings distributed on different kinds of variables, e.g. number of ratings, year of release and so on. Then we want to replicate the simple plot of the average ratings of a movie along the average rating of the book it is based on like on Vocativ. Following this we will discuss different ways to do a meaningful comparison of ratings that’s not just a 1:1 using the scales on each webpage.
Hereafter we want to do some simple linear regression analysis to find possible trends in book-to-movie ratings when controlling for other variables, and see how this affects the conclusion that books supposedly are better.
Project Description
Idea We read an article about the relationship between book ratings and the rating of their movie counterparts on Vocativ: http://www.vocativ.com/news/245040/the-book-is-better-than-the-movie The article concluded that books in general were rated better and therefore superior to the average movie adaptation. Following this idea we want to do our own examination of the relationship between ratings of books and ratings of movies adapted from said books. We want to include more variables than just the ratings however since we believe this relationship to be influenced by many other things. Also we did not agree with the direct comparison of book rating to movie rating since the book ratings in the article did not use the entire scale (e.g from 1-5 almost all books scored at least 3), and the movies were more prone to use their 1-10 scale. So our main question is:
Data We want to scrape data using R from the two sources used in the article on Vocativ:
From Goodreads we choose from the list “I Saw the Movie & Read the Book” and want to scrape for each book:
Analysis At first we want to look at the movie data and book data separately, e.g. what are the average ratings distributed on different kinds of variables, e.g. number of ratings, year of release and so on. Then we want to replicate the simple plot of the average ratings of a movie along the average rating of the book it is based on like on Vocativ. Following this we will discuss different ways to do a meaningful comparison of ratings that’s not just a 1:1 using the scales on each webpage. Hereafter we want to do some simple linear regression analysis to find possible trends in book-to-movie ratings when controlling for other variables, and see how this affects the conclusion that books supposedly are better.