kaustubhhiware / rose

Analyse all kinds of data for a TV series
MIT License
37 stars 13 forks source link
python scraper tv-series tv-shows


Analyse all kinds of data for a TV series. Available as a webapp at rose-tv.herokuapp.com.


Rose (of Two and a Half Men) is a highly intelligent, deceiving and manipulative woman. In the beginning of the series she was nothing more than one of Charlie's one night stand however she quickly turned into his stalker, she has an obsessive nature and both loves and resents Charlie.

Rose (this repository) aims to be something similar. For a given TV series, it scrapes the following:


Two and a Half Men is one of the few shows available on Indian English channels, of which I had watched a few episodes during JEE days. I had the recent urge to finish the series. One observation everyone would make is as the season progressed, the last seasons really took a hit. Series finale was the worst, hitting the lowest the series had ever seen (IMDB 4.3).

I wanted to observe if there was any pattern here. Due to lack of proper existing tools and GraphTV going down, I had to take matter into my own hands.


The results are being rendered via Google sheets charts, because they're interactive. Clicking on a certain image would help, because I couldn't embed google charts in iframes.

The first chart plots views for each episode across seasons. The second chart plots views per episode and average season views.

TV views


The dataset is available here for viewing.


Charlie Sheen was one of the male lead for first 8 seasons, who was replaced by Ashton Kutcher. The script writing went horrible, and some correlation in the data was expected.

The data confers. Observing the number of views, S11 and S12 took a big hit. S09E01 saw a change in the lead thus the spike in views. It was expected to see a spike in views every season finale, but that was not necessarily true here. Seeing IMDB, which mostly confers to scriptwriting, Season 9 onwards became really bad, so Ashton wasn't really to blame.


# install dependencies
> pip3 install -r requirements.txt

> python3 scrape_views.py -h

optional arguments:
  -h, --help            show this help message and exit
  -i, --imdb            Display only imdb ratings
  -w, --wiki            Display only wikipedia US TV viewers
  -s SHOW, --show SHOW  Provide show name
  -b, --bar             Display bar chart or not
  -a, --avg             Display averaged chart or not
  -e EPI, --epi EPI     Provide Episode name
  -c, --cast            Displays Cast of the show

# Plot averaged IMDB ratings for a show
> python3 scrape_views.py -i -a -s 'Two and a half men'

# Fetch information for a single episode
> python3 scrape_views.py -s 'Two and a half men' -e 'S03E06'

# Fetch information for the star cast of the show
> python3 scrape_views.py -c -s 'Two and a half men'

More imdb plots are available in GALLERY.md.

Running the webapp

The codebase related to the webapp can be found at webapp/ folder. Further details are available there.

# Option 1: Use procfile
> heroku local web

# Option 2: Run via flask directly
> cd webapp
> python3 flaskwebapp.py


Contributors can have a look at text_logs folder to get a sample of the download response. Before contributing, please checkout CONTRIBUTING.MD


The MIT License (MIT) 2018 - Kaustubh Hiware. Have a look at the LICENSE for more details.