The backend for NPR's 2018 miderm general-election coverage. It includes an Associated Press data ETL, database, admin panel, and produces JSON output for use on the front-end. It is an iteration upon the 2016 GE and the 2017 Alabama special-election work.
So, you're starting work on 2020 development? Lucky you! Here a few tips:
virtualenv
and virtualenvwrapper
awscli
tidylib
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
(AWS_PROFILE
not supported)AP_API_KEY
Additional optional environment variables are described later.
The project contains the following folders and important files:
confs
-- Server configuration files for nginx and uwsgi. Edit the templates then fab <ENV> servers.render_confs
, don't edit anything in confs/rendered
directly.data
-- Data files, such as those used to generate HTML.fabfile
-- Fabric commands for automating setup, deployment, data processing, etc.etc
-- Miscellaneous scripts and metadata for project bootstrapping.less
-- LESS files, will be compiled to CSS and concatenated for deployment.templates
-- HTML (Jinja2) templates, to be compiled locally.tests
-- Python unit tests.www
-- Static and compiled assets to be deployed. (a.k.a. "the output")www/assets
-- A symlink to an S3 bucket containing binary assets (images, audio).www/live-data
-- "Live" data deployed to S3 via cron jobs or other mechanisms. (Not deployed with the rest of the project.)www/test
-- Javascript tests and supporting files.app.py
-- A Flask app for rendering the project locally.app_config.py
-- Global project configuration for scripts, deployment, etc.render_utils.py
-- Code supporting template rendering.requirements.txt
-- Python requirements.The core functionality of this app is to fetch results from the AP elections API, bake them to JSON and publish that JSON to S3 for consumption by the front-end graphics code. This is how the various pieces of software in this app work together to fetch and publish the results.
When the project is deployed, a service is created named fetch_and_publish_results
by copying confs/fetch_and_publish_results.conf
to /etc/init/fetch_and_publish_results
. Once deployed, this service can be started and stopped using Fabric tasks.
The fetch_and_publish_results
service calls run_on_server.sh
to initialize the Python and shell environment and then runs the daemons.fetch_and_publish_results
Fabric task. This task just runs the daemons.main
Fabric task.
elex
to fetch results into a Postgres databaseThe daemons.main
Fabric task executes the data.load_results
Fabric task. This task uses the elex CLI to download the results as CSV. It then uses psql
to load the CSV into a PostgreSQL database using a COPY
query.
_note: you can pass zeroes to the load_results task (data.load_results:zeroes
) to override results with zeros; omits the winner indicator. Sets the vote, delegate, and reporting precinct counts to zero._
After fetching the results and loading them into the database, the daemons.main
Fabric task executes the publish_results
Fabric task. This task calls the render.render
Fabric task which calls other Python code that uses the Peewee ORM to retrieve results from the database through the models.models.Result
model. The _serialize_results
function takes the Peewee model instances, converts them to plain Python dictionaries and adds a few calculated fields. It also shapes the collection of results into the format that will eventually be dumped to a JSON string by _write_json_file
.
After calling the render.render
Fabric task, the publish_results
task calls the move_s3
task which simply uses the aws
CLI to upload the results JSON file to S3.
Project secrets should never be stored in app_config.py
or anywhere else in the repository. They will be leaked to the client if you do. Instead, always store passwords, keys, etc. in environment variables and document that they are needed here in the README.
Any environment variable that starts with $PROJECT_SLUG_
will be automatically loaded when app_config.get_secrets()
is called.
The result loading and baking daemon outputs a JSON file to S3.
{
"results": {
"1683": {
"candidates": [
{
"first": "Doug",
"last": "Jones",
"party": "Dem",
"votecount": 0,
"votepct": 0.0,
"winner": false
},
{
"first": "Roy",
"last": "Moore",
"party": "GOP",
"votecount": 0,
"votepct": 0.0,
"winner": false
},
{
"first": null,
"last": "Total Write-Ins",
"party": "NPD",
"votecount": 0,
"votepct": 0.0,
"winner": false
}
],
"lastupdated": "Nov. 29, 2017, 2:30 p.m.",
"level": "state",
"nprformat_precinctsreportingpct": "0%",
"officename": "U.S. Senate",
"precinctsreporting": 0,
"precinctstotal": 2220,
"statename": "Alabama",
"statepostal": "AL"
}
}
}
We've tried to make the output JSON compact to minimize the amount of data that a user needs to download to retrieve the results.
When possible, we also try to pre-format data, such as dates or percentages, in the JSON to limit the size and complexity of front-end code.
The properties nested under the results
property, 1683
in the example above, are the AP race IDs. Using a unique identifier as a key for each race allows the front end code to quicky retrieve results for a given race without iteration. Be aware that it is possible for the AP race IDs to change prior to election night. The final IDs will be sent over with the zeroed results on election day. If they differ from the IDs used during testing, front-end configuration or code that references the ID will need to be updated.
candidates
An array of candidate results.
lastupdated
Formatted timestamp reflecting:
Example: Nov. 29, 2017, 2:30 p.m.
.
level
The reporting level of the results.
TODO: Document other possible levels. I think this is documented in the elex docs.
Example: state
nprformat\_precinctsreportingpct
String containing formatted percentage of precincts reporting:.
<1%
.15.1%
.>99%
.officename
Name of the office for this election.
Example: U.S. Senate
precinctsreporting
Integer representing number of precincts reporting results.
Example: 1
precinctstotal
Integer representing total number of precincts that can report results for this election.
Example: 2220
statename
Name of state for these election results.
Example: Alabama
statepostal
Abbreviation for Alabama.
Example: AL
These fields represent results for individual candidates and are collected under the candidates
property of a race record.
first
String containing first name of candidate or null
for pseudo-candidates.
Example: "Doug"
last
String containing last name of candidate or identifier of pseudo-candidates.
Examples:
Jones
Total Write-Ins
votecount
Integer representing total number of votes received by the candidate.
Example: 0
votepct
Float representing percentage of total votes won by the candidate.
Example: 0.0
winner
Boolean representing whether this candidate has been called as the winner. Defaults to the AP call but will be overridden by the NPR call if specified in the admin.
Example: false
This app shares many of the configuration variables common to apps based on NPR's App Template. This section documents application-specific configuration variables.
In most cases, configuration is through variables defined in the app\_config
module in app\_config.py
. However, some configuration may be defined through environment variables.
API key used by elex
to authenticate to the Associated Press' results API.
Type: Environment variable
Command line flags for the elex
command. See the elex cli documentation for available flags.
This supports multiple different elex
calls; for example, one may want to make a reportingunit
-level call for presidential results, but a state
-level call for the result of all other race types.
Type: app\_config
variable
Example: '--national-only'
Command line flags for the elex\_ftp
command, which is a vendorized version of elex-ftp-loader. This is available as a fallback if there are issues retrieving results through AP's API. However, the API is the preferred method of retrieving results.
Type: app\_config
variable
Example: '--states AL'
Command line flags for the elex
command used to force zeroed-out results with fab data.load\_results:mode=zeroes
. See the elex cli documentation for more information.
Type: app\_config
variable
Example: '--national-only --set-zero-counts'
Time, in seconds, between requests to the AP API. The AP API is throttled, so you can't set this to be too small.
Type: app\_config
variable
Example: 10
Our system typically only includes the Democrat and Republican (or just the top/main two candidates) in the JSON files that get rendered.
Sometimes, we'll want to explicitly include a third-party candidate, or include three or more candidates. Use this option to do so. If no votes are in yet, we'll maintain the candidate order provided. If any votes are in, allow this set of candidates to be reordered by the system.
We're using AP candidateid
to identify, which is unique at the race level. Structure is { 'AP RACE ID 1': [ 'AP CANDIDATEID 1', 'AP CANDIDATEID 2', ... ], ... }
Type: app\_config
variable
Example:
{
# New York's 22nd House seat: Tenney, Myers, and Babinec
'36602': ['79331', '79334', '79335'],
# Alaska Senate seat: Murkowski, Miller, and Stock
'2933': ['6021', '6650', '6647']
}
In certain cases, we want to override the party of a particular candidate. This needs to be done upstream of most of the data processing: before the balance-of-power calculation, and before baking the state- and chamber-level JSON files.
The structure is a map of party
to use for the candidates, and the polID
of the candidates to switch to that party.
Type: app\_config
variable
Example:
{
'Dem': [
# Alyse Galvin registered as "undeclared," but won the Democratic primary
'67552'
]
}
Path to folder where results JSON is rendered before being uploaded to S3.
Type: app\_config
variable
Example: '.rendered'
Large media assets (images, videos, audio) are synced with an Amazon S3 bucket specified in app_config.ASSETS_S3_BUCKET
in a folder with the name of the project. (This bucket should not be the same as any of your app_config.PRODUCTION_S3_BUCKETS
or app_config.STAGING_S3_BUCKETS
.) This allows everyone who works on the project to access these assets without storing them in the repo, giving us faster clone times and the ability to open source our work.
Syncing these assets requires running a couple different commands at the right times. When you create new assets or make changes to current assets that need to get uploaded to the server, run fab assets.sync
. This will do a few things:
Unfortunantely, there is no automatic way to know when a file has been intentionally deleted from the server or your local directory. When you want to simultaneously remove a file from the server and your local environment (i.e. it is not needed in the project any longer), run fab assets.rm:"www/assets/file_name_here.jpg"
A site can have any number of rendered pages, each with a corresponding template and view. To create a new one:
templates
directory. Ensure it extends _base.html
.app.py
. Decorate it with a route to the page name, i.e. @app.route('/filename.html')
.html
and do not start with _
will automatically be rendered when you call fab render
.We need to create instances for both our staging and our production environments. For each environment we need to setup an EC2 instance to run the Python daemon and admin web app and a RDS instance for our database that we use to store the results coming from the AP API through elex.
This project did not have strong requirements in terms of performance nor data loads so we chose medium sized virtual machines. For other elections a new assessment will need to be made on data throughput and storage capacity.
We use Ubuntu 16.04 LTS images for Python 3 projects.
m4.large
NPR-offices
(or similar, to allow ssh
from NPR-local computers)virtualenv
npm
uwsgi
libtidy-dev
Note: NPR users can use our AMI that already contains this configuration, python3 webserver
; you may have to manually install Node.js, tidylib
, and npm
.
db.t2.medium
rds-access-npr
(or similar, to allow psql
and Peewee access over Postgres ports from NPR-local computers)Make sure to store the password, hostname, and other credentials in environment variables in workinprivate
.
At the end of this provisioning process, you should be able to psql
into the RDS instance and ssh
into the EC2 instance while within the NPR DC office WiFi. When on any other network, these connections should fail. Furthermore, you should be able to psql
into the RDS instance from within the EC2 instance.
This app can be deployed to EC2 using Fabric in a manner to other NPR apps that run on servers.
app_config.py
set DEPLOY_TO_SERVERS
to True
.fab staging master servers.setup
to configure the server./etc/enviroment
file; this includes the AWS credentials, Google Apps credentials, AP API key, database connection credentials, and deployment targetfab staging master servers.fabcast:data.bootstrap_db
Once we have setup our servers we will need to initiate the webservices to support the admin that will allow us to override winner calls from AP: fab servers.deploy_confs
(see also Install web services). More details on the Admin can be found here.
DEPLOY_TO_SERVERS
is set to True
in app_config.py
.fab staging master servers.checkout_latest
to update codebase on the serverDEPLOY_TO_SERVERS
is set to True
in app_config.py
.fab staging master servers.checkout_latest
to update codebase on the serverfab staging master servers.fabcast:data.bootstrap_db
Since there are multiple components and processes, it's easiest to coordinate and containerize all of them using Docker Compose. Make sure that you have Docker installed on your computer.
Select environment variables (dictated in docker-compose.yml
) will be shared with the Docker containers. Updates to the code on your local machine will be reflected in the containers (since their file systems share the repo directory from your local machine). Similar to how you'd need to stop and start most processes on your local machine, you may need to stop (docker-compose stop ${SERVICE_NAME}
) and restart (docker-compose up ${SERVICE_NAME}
) to get the updated code to run. If you change the Dockerfile
s, requirements.txt
, or package.json
files containing the operating systems, libraries, and binaries installed on the Docker containers, you will need to run docker-compose build ${SERVICE_NAME}
in order to update that Docker container.
docker-compose up database
5433
instead of the Postgres-default 5432
so that the Docker container's open port doesn't conflict with any local Postgres instances you have on your machinepsql postgres://elections18:elections18@localhost:5433/elections18
docker-compose up daemon
GRAPHICS_DATA_OUTPUT_FOLDER
, set in app_config.py
docker-compose up app
localhost:8001
; eg, http://localhost:8001/elections18/calls/senate/
fakeapserver
Docker Compose service that can mock constantly-updating AP data using AP Deja-Vu. Per the comments in docker-compose.yml
, you can run this service (docker-compose up fakeapserver
), connect to its admin panel on http://localhost:8002/elections/${YEAR_OF_ELECTION}/ap-deja-vu/
, and then point the daemon
service at this fake AP API endpoint.(Again, all of the above could be executed on your local machine, but it's much simpler to handle the varied OS+binary+library environments within containers, and also makes for quicker local setup.)
To start the daemon that loads results into the database, bakes them to JSON and publishes the JSON to S3, run this Fabric task:
fab production servers.start_service:fetch_and_publish_results
To stop the daemon, run this Fabric task:
fab production servers.stop_service:fetch_and_publish_results
There is a web-based admin interface that can be used to call winners in races. The winners called through the admin will override the winner in the AP results and will be reflected in the published results JSON.
In the admin we can decide whether or not we accept AP calls for winners in a given race.
For example if you are running the local webserver you can check the admin for senate races by visiting http://localhost:8000/elections18-general/calls/senate/
If we decide to not accept AP calls for winners in a given race we can then make a manual call ourselves for a given candidate in the race and that will be reflected in the published results JSON.
For example a manual call for Doug Jones
would look like this:
Note: This project was first created using NPR app-template and even though we have stripped out the unused boilerplate that came along from it, we have left the COPY functionality because for subsequent elections we can foresee the use of the COPY worklow to add meta information for an election like the expected winner, or any other information given to us by the politics team that will add value to the pure results provided by AP
This app uses a Google Spreadsheet for a simple key/value store that provides an editing workflow.
To access the Google doc, you'll need to create a Google API project via the Google developer console.
Enable the Drive API for your project and create a "web application" client ID.
For the redirect URIs use:
http://localhost:8000/authenticate/
http://127.0.0.1:8000/authenticate
http://localhost:8888/authenticate/
http://127.0.0.1:8888/authenticate
For the Javascript origins use:
http://localhost:8000
http://127.0.0.1:8000
http://localhost:8888
http://127.0.0.1:8888
You'll also need to set some environment variables:
export GOOGLE_OAUTH_CLIENT_ID="something-something.apps.googleusercontent.com"
export GOOGLE_OAUTH_CONSUMER_SECRET="bIgLonGStringOfCharacT3rs"
export AUTHOMATIC_SALT="jAmOnYourKeyBoaRd"
Note that AUTHOMATIC_SALT
can be set to any random string. It's just cryptographic salt for the authentication library we use.
Once set up, run fab app
and visit http://localhost:8000
in your browser. If authentication is not configured, you'll be asked to allow the application for read-only access to Google drive, the account profile, and offline access on behalf of one of your Google accounts. This should be a one-time operation across all app-template projects.
It is possible to grant access to other accounts on a per-project basis by changing GOOGLE_OAUTH_CREDENTIALS_PATH
in app_config.py
.
View the sample copy spreadsheet.
This document is specified in app_config
with the variable COPY_GOOGLE_DOC_KEY
. To use your own spreadsheet, change this value to reflect your document's key. (The long string of random looking characters is in your Google Docs URL. For example: 1DiE0j6vcCm55Dyj_sV5OJYoNXRRhn_Pjsndba7dVljo
)
A few things to note:
key
, there is expected to be a column called value
and rows will be accessed in templates as key/value pairsThe app template is outfitted with a few fab
utility functions that make pulling changes and updating your local data easy.
To update the latest document, run:
fab text.update
Note: text.update
runs automatically whenever fab render
is called.
At the template level, Jinja maintains a COPY
object that you can use to access your values in the templates. Using our example sheet, to use the byline
key in templates/index.html
:
{{ COPY.attribution.byline }}
More generally, you can access anything defined in your Google Doc like so:
{{ COPY.sheet_name.key_name }}
You may also access rows using iterators. In this case, the column headers of the spreadsheet become keys and the row cells values. For example:
{% for row in COPY.sheet_name %}
{{ row.column_one_header }}
{{ row.column_two_header }}
{% endfor %}
When naming keys in the COPY document, please attempt to group them by common prefixes and order them by appearance on the page. For instance:
title
byline
about_header
about_body
about_url
download_label
download_url
Example: This is how the Get Caught Up chunk was created on the backend and hooked up to the front-end.
COPY_GOOGLE_DOC_KEY
@task
decorator that will load the data and save it to a json file. Make sure to call that method in the render_all()
method.fab text.update
fab render.render_get_caught_up
docker-compose up daemon
. If that doesn't work you could try docker-compose up bootstrap_db
That's the backend portion of hooking up a new tab in the COPY spreadsheet.
Want to edit/view the app's linked google spreadsheet, we got you covered.
We have created a simple Fabric task spreadsheet
. It will try to find and open the app's linked google spreadsheet on your default browser.
fab spreadsheet
If you are working with other arbitraty google docs that are not involved with the COPY rig you can pass a key as a parameter to have that spreadsheet opened instead on your browser
fab spreadsheet:$GOOGLE_DOC_KEY
For example:
fab spreadsheet:12_F0yhsXEPN1w3GOlQB4_NKGadXiRLOa9l-HQu5jSL8
// Will open 270 project number-crunching spreadsheet
This project uses a custom font build powered by Fontello
If the font does not exist, it will be created when running fab update
.
To force generation of the custom font, run:
fab utils.install_font:true
Editing the font is a little tricky -- you have to use the Fontello web gui. To open the gui with your font configuration, run:
fab utils.open_font
Now edit the font, download the font pack, copy the new config.json into this
project's fontello
directory, and run fab utils.install_font:true
again.
Sometimes, our projects need to read data from a Google Doc that's not involved with the COPY rig. In this case, we've got a helper function for you to download an arbitrary Google spreadsheet.
This solution will download the uncached version of the document, unlike those methods which use the "publish to the Web" functionality baked into Google Docs. Published versions can take up to 15 minutes up update!
Make sure you're authenticated, then call oauth.get_document(key, file_path)
.
Here's an example of what you might do:
from copytext import Copy
from oauth import get_document
def read_my_google_doc():
file_path = 'data/extra_data.xlsx'
get_document('1pja8aNw24ZGZTrfO8TSQCfN76gQrj6OhEcs07uz0_C0', file_path)
data = Copy(file_path)
for row in data['example_list']:
print '%s: %s' % (row['term'], row['definition'])
read_my_google_doc()
Python unit tests are stored in the tests
directory. Run them with fab tests
.
Compile LESS to CSS, compile javascript templates to Javascript and minify all assets:
fab render
(This is done automatically whenever you deploy to S3.)
If you want to test the app once you've rendered it out, just use the Python webserver:
cd www
python -m SimpleHTTPServer
Web services are configured in the confs/
folder.
Running fab servers.setup
will deploy your confs if you have set DEPLOY_TO_SERVERS
to True
at the top of app_config.py
.
To check that these files are being properly rendered, you can render them locally and see the results in the confs/rendered/
directory.
fab servers.render_confs
Sometimes it makes sense to run a fabric command on the server, for instance, when you need to render using a production database. You can do this with the fabcast
fabric command. For example:
fab staging master servers.fabcast:deploy
If any of the commands you run themselves require executing on the server, the server will SSH into itself to run them.