tattle-made / factchecking-sites-scraper

A repo to store helper functions for scraping + experiments/visualisations
GNU General Public License v3.0
2 stars 7 forks source link

Why Does This Exist?

One of Tattle's goals is to make stories fact-checking content circulated on chat apps and social media more accessible to mobile first users. To make the content accessible, Tattle wants the content to be discoverable through image search and video search. To implement search, Tattle needs the multi-media content inside the stories from fact-checking sites, linked with the sites that it is coming from.

Introduction

This repository contains collection of scripts to scrape the factchecking sections of the following sites:

At present Tattle only scrapes IFCN certified fact-checking sites. See factchecting_sites_status.md for the updated status on each of the websites.

Running Locally:

Prerequisites:

The code can be amended so that content is written to a local folder (instead of an S3 bucket). For conciseness, those steps have been excluded from this documentation. If you need help doing that, please reach out to us (See section on 'Get Help with Developing')

Steps to Run:

This scraper has gone through multiple iterations and has different implementations for different fact checking sites.

For each of these scrapers:

Request Access

If you want access to the fact-checking sites data please fill out this form. If you have a specific requirement not covered by this form, please ping us on Slack.

Generating the Fact-Checking Sites Dashboard

The data collected from the scrapers is used to generate the weekly fact-checking sites dashboard: https://services.tattle.co.in/khoj/dashboard/

The instructions to generate the dashboard can be found in the data-experiments repository.

Contribute

Please see instructions here.

Get help with developing

Join our Slack channel to get someone to respond to immediate doubts and queries.

Want to get help deploying it into your organisation?

Contact us at admin@tattle.co.in or ping us on Slack channel

Licence

When you submit code changes, your submissions are understood to be under the same licence that covers the project - GPL-3. Feel free to contact the maintainers if that's a concern.