simonw / show-your-workings

Collaborative repository gathering examples of journalists publishing the data and methodology behind their stories
0 stars 0 forks source link

Design the data model #1

Open simonw opened 4 years ago

simonw commented 4 years ago

This repo will store examples of journalists publishing their workings - data, code and methodology - behind the stories that they report.

First step: figure out the data model.

simonw commented 4 years ago

It's not quite as simple as a "story" - because one of these things could spin off multiple stories, and one story might have multiple GitHub repos / methodology explanations behind it.

Maybe "project" is the right term here. Or "investigation"?

simonw commented 4 years ago

I'll use https://github.com/wpinvestigative/helicopters_dc as a simple example.

An investigation could have a required name and an optional description:

It can have multiple stories (articles?). Here that's https://www.washingtonpost.com/graphics/2020/investigations/helicopter-protests-washington-dc-national-guard/ - "A low-flying ‘show of force’", which has a headline, a byline and a publication date.

It can have one or more repositories/notebooks/explanations/assets? Really not sure what to call these or how to group them. In this case those include:

It has contributors. In this case the byline on the story has Alex Horton, Andrew Ba Tran, Aaron Steckelberg and John Muyskens. The GitHub repo has two contributors: andrewbtran and jmuyskens - both of whom are also in the story byline, but I could imagine situations in which some contributors don't get included there.

This is pretty complicated already! And this is for a simple case where there's one story and one GitHub repo backing it.

simonw commented 4 years ago

Some other, more complex examples:

simonw commented 4 years ago

I like the term "investigation" for the main entity.

An investigation can have multiple contributors, multiple stories and multiple assets (repos, notebooks, explainer articles etc).

I'm not sold on "assets" here. What's a good name that could encompass all of the following?

simonw commented 4 years ago

https://nieonline.com/coloradonie/downloads/journalism/GlossaryOfNewspaperTerms.pdf may be useful.

simonw commented 4 years ago

reporting or reporting_artifact perhaps.

simonw commented 4 years ago

I'm going with artifact. If a better name shows up later I'll switch to that instead.

simonw commented 4 years ago

Data model as YAML:

title: A low-flying ‘show of force’
description: The Washington Post story A low-flying ‘show of force’ focused on two
  military helicopters that roared over demonstrators in Washington D.C on June 1
stories:
- headline: A low-flying ‘show of force’
  byline: Alex Horton, Andrew Ba Tran, Aaron Steckelberg and John Muyskens
  publish_date: 2020-06-23
artifacts:
- title: wpinvestigative/helicopters_dc
  url: https://github.com/wpinvestigative/helicopters_dc

Transformed to JSON that looks like this:

{
   "title": "A low-flying ‘show of force’",
   "description": "The Washington Post story A low-flying ‘show of force’ focused on two\nmilitary helicopters that roared over demonstrators in Washington D.C on June 1",
   "stories": [
      {
         "headline": "A low-flying ‘show of force’",
         "byline": "Alex Horton, Andrew Ba Tran, Aaron Steckelberg and John Muyskens",
         "publish_date": "2020-06-23"
      }
   ],
   "artifacts": [
      {
         "title": "wpinvestigative/helicopters_dc",
         "url": "https://github.com/wpinvestigative/helicopters_dc"
      }
   ]
}