lanl / WHO-FLUMART-scraper

Python code for scraping the WHO's FLUMART data.
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

WHO FLUMART scraper

AUTHOR

Geoffrey Fairchild

LICENSE

This software is licensed under the BSD 3-Clause License. Please refer to the separate LICENSE file for the exact text of the license. You are obligated to give attribution if you use this code.

ABOUT

This simple code can be used to collect global historical influenza surveillance data made public by the World Health Organization (WHO) on its FLUMART website purely through HTTP requests.

As it turns out, scraping this website is non-trivial due to the presence of several important hidden HTML values that are dynamically populated as the user interacts with the website. A separate open source scraping effort here relies on Selenium, which can be finicky to setup and use, so I set out to scrape the data "properly" by interacting purely through HTTP requests.

Some of the key challenges to scraping this dataset were solved with the help of StackOverflow here.

REQUIREMENTS

This code requires Python 3.6 or higher, requests, and Beautiful Soup 4.