hannesdatta / covid-19-book-consumption

MSc Thesis project examines the influence of COVID-19 on book reading behaviour
0 stars 2 forks source link

Impact of COVID-19 Restrictions on Book Consumption

This repository presents additonal material related to my master thesis The Impact of COVID-19 Restrictions on Book Consumption. Specificically, this repository contains the workflow for data preperation and analysis used for my thesis.

In my thesis, I have investigated how COVID-19 restrictions have affected the amount people read, consumers’ reading speed, evaluation of books and types of books read and how these effects vary across age groups, genders, types of readers and nationalities. The expected relationships that have been investigated are shown below:

image

Data Description

To investigate the impact of COVID-19 restrictions on book consumption, we use data scraped from the reading community website Goodreads. We collected 18,252,877 book reading records from 112,087 unique Goodreads users that were found via the 31 largest country-specific subgroups on Goodreads. Our dataset covers the consumption of books over a 15-year timeframe, including almost two years after the outbreak of COVID-19.

Repository overview

├── README.md
├── makefile
├── Verweij (2022).pdf
├── .gitignore
├── data
├── gen
|   ├── temp
|   └── output
└── src
    ├── analysis
    ├── data-collection
    └── data-preparation

Dependencies

Please follow the installation guide on http://tilburgsciencehub.com/.

Please follow the installation guide on http://tilburgsciencehub.com/.

Running the code

Follow below instructions to run the code:

  1. Fork this repository
  2. Open your command line/terminal:
git clone https://github.com/[your username]/covid-19-book-consumption.git
  1. Make sure your current working direcotry is covid-19-book-consumption
  2. If not, type cd yourpath/covid-19-book-consumption to change your directory
  3. Execute following command to execute the workflow:
make

Running the data collection

Note: Above worflow does not include the data collection steps and the combination of the scraped data file. The reason for this is two-fold. First, the data collection steps take about 3.5 months to completely run. Hence, it would not be efficient to include in the reproduction workflow. Second, since the source code of Goodreads is not static but rather dynamic, the data scraper program had to be slightly adjusted several times during the process. Therefore, the running process was cut into smaller chuncks such that we could find out about possible problems as soon as possible and could timely adjust the programm. Therefore, the data scraping software collected multiple seperate files that were later combined into larger files.

Below shows an overview of the order in which these programms were run:

Author

Mike Verweij