RAFI Poultry

This project, in partnership with RAFI-USA, shows concentration in the poultry packaging industry.

The dashboard is displayed on RAFI's site here. Visit the site for more detail on the project and the background — this README will focus on the technical details of the project.

Pipeline

The data pipeline for this project does the following:

Joins records from FSIS inspections with historical business data provided by NETS.
Calculates 60 mile road distances from each plant in the FSIS records meeting our filtering criteria.
Creates GeoJSONs for areas with access to one, two, or three plus poultry integrators.
Filters poultry barns identified by a computer vision model trained by Microsoft to reduce the number of false positives.

Docker

The pipeline runs in Docker. If you use VS Code, this is set up to run in a dev container, so build the container the way you normally would. Otherwise, just build the Docker image from the Dockerfile in the root of the directory.

If you are using the dev container, make sure that you change the PLATFORM variable in the devcontainer.json for your chip architecture:

"args": {
    "PLATFORM": "linux/arm64/v8" // Change this to "linux/amd64" on WSL and "linux/arm64/v8" on M1
}

Data Files

Download the following files into the appropriate locations. Note that permission is required to access the DSI Google Drive.

Example FSIS data is located in the DSI Google Drive: MPI Directory by Establishment Name | Establishment Demographic Data
- Save both files to data/raw/
- You can also download new data from the FSIS Inspection site. Just update the filepaths config file
NETS data is located in the DSI Google Drive. Download this to data/raw/ and save in a directory called nets
Download the raw barns predictions for the entire USA from the DSI Google Drive and save to data/raw/
Barn filtering shapefiles: Download the zip of all of the shapefiles from Google Drive and extract to data/shapefiles. The sources for these shapefiles are listed in pipeline/rafi/config_geo_filters.yaml.

Using Different Files

If you are using different files (particularly for the FSIS data), just update the filenames in pipeline/rafi/config_filepaths.yaml. Make sure the files are in the expected folder.

API Keys

The pipeline uses Mapbox to calculate driving distances from the plants and expects a Mapbox API key located in a .env file saved to the root of the directory:

MAPBOX_API=yOuRmApBoXaPiKey

Running the Pipeline

After all of the files and API keys are in place, run the pipeline:

python pipeline/pipelinve_v2.py

Cleaned data files will be output in a run folder in data/clean/. To update the files displayed on the dashboard, follow the instuctions in Updating the Dashboard Data

Note: You can also run each step of the pipline independently. Just make sure that the input files are available as expected in __main__ for each script.

Pipeline V1

There is old code in the pipeline/pipeline_v1/ directory. This includes a previous version of the pipeline that used Infogroup business data (rather than NETS data). This is saved for reference in case we want to use Infogroup again in a future version of the pipeline.

Dashboard

This is a Next.js project.

Running the Dashboard

To run the dashboard locally, do not use the dev container!

Install Packages

Install packages:

npm install

Set up Environment Variables for Local Deployment

The dashboard needs Mapbox credentials and service account credentials for Google Cloud.

It expects a .env.local file in dashboard/ with a Mapbox API key and a base64-encoded Google service account JSON (with permissions to access Cloud Storage buckets):

NEXT_PUBLIC_MAPBOX_ACCESS_TOKEN=yOuRmApBoXaPiKey
GOOGLE_APPLICATION_CREDENTIALS_BASE64=<base64-encoded-service-account.json>

Running the Server

Run the development server:

npm run dev

Open http://localhost:3000 with your browser to see the result.

Deplying the Dashboard

The dashboard is deployed via Vercel and is hosted on RAFI's site in an iframe.

Any update to the main branch of this repo will update the production deployment of the dashboard.

Updating the Dashboard Data

If you rerun the pipeline, you need to update data files in both Google Cloud Storage and the files packaged with the Vercel deployment from GitHub.

Google Cloud Storage

The dashboard pulls data from Google Cloud Storage via an API. Upload the following files to the root of the rafi-poultry storage bucket in the rafi-usa project in the DSI account:

barns.geojson.gz
plants.geojson

Packaged Files

The dashboard loads the isochrones files showing captured areas from dashboard/public/data/v2/isochrones.geojson.gz

Dashboard Structure

Data

The dashboard loads data in lib/data.js. This loads the packaged data and the Google Cloud Storage data via API calls.

Data is managed in lib/state.js and lib/useMapData.js

Both the NETS data and farmer locations are sensitive, so those data files are processed behind api routes located in api/.

Components

The dashboard consists primarily of a map component and a summary stats component.

The map logic lives in components/DeckGLMap.js and components/ControlPanel.js and the summary stats logic lives in components/SummaryStats.js and the sub-components.

uchicago-dsi / rafi-poultry

readme