usds / justice40-tool

A tool to identify disadvantaged communities due to environmental, socioeconomic and health burdens
https://screeningtool.geoplatform.gov/
Creative Commons Zero v1.0 Universal
125 stars 42 forks source link

As an organization interested in Justice40, I'd like an API so that I can automate my consumption of Justice40 data #2199

Open travis-newby opened 1 year ago

travis-newby commented 1 year ago

Description

Justice40 publishes data and methodology on the Climate and Economic Justice Screening Tool (CEJST). The published data is structured and reasonably easy to consume, but there is no long-term strategy around publishing, versioning, or discoverability.

Additionally, Federal Agencies and NGOs have asked for a more formal API to make it easier to incorporate Justice40 data into their own systems for display and comparison between indices.

Solution

The solution is to publish Justice40 data and documentation via a simple, versioned API available at an easy to find Justice40 subdomain (e.g. **api**.environmentaljustice.gov). To aid in discoverability, that API should be defined using the OpenAPI specification and cataloged on data.gov.

Published data for each version should include:

Additional Considerations

Data Formats

All data should be published in open, easily consumable formats. The specific format for files should be determined by a combination of openness and ease of use. For geographic data, this may be GeoPackage or GeoJSON. Spreadsheets should be published as CSV files, and documents should be PDF or Markdown.

All of this data should be generated as part of the Justice40 data pipeline.

Individual datasets should be published in a single format (i.e. do not publish the same information in multiple formats). While some sites publish their data in as many formats as possible, it's impossible for Justice40 to anticipate the desired data format for every consumer; therefore, Justice40 should pick open, common formats and allow consumers of Justice40 data to perform any translations necessary to make the data consumable by their system.

Published Data

The data published in each API version should be comprehensive; that is, there should be enough data to describe the API and the data available, define how to use the API and data, and, of course, there should be the data itself. Right now, that list includes the information above, but it could change as the API evolves.

API Versioning

Justice40 disadvantaged communities and scores may change over time. Some agencies may immediately shift to using the new score, but some may not. Because of this – and because agencies sometimes need to review older scores – it's critical that any Justice40 API include versioning. Major revisions of the score should receive their own version number, and minor revisions that have an impact on the score may receive a minor version number.

Sometimes agencies do not care if score data has changed; for example, some agencies may only want to show Justice40 map tiles on their map. In that case, Justice40 should maintain a current version of their API, always mapped to the latest version (e.g. api.environmentaljustice.gov/current/ could be mapped to api.environmentaljustice.gov/v1/).

It is beyond the scope of this request to determine how to implement API versioning (whether versioning is in the header, the url, or some other form).

Change Announcements

It is important to let agencies know of changes to the API. In addition to information in the OpenAPI specification, a process should be developed to announce API changes. This may involve a mailing list, a Google group (including the existing group), or some other way for agencies to subscribe to notifications of API updates.

Next Steps

Once this API is in place, client applications, such as CEJST, should be updated to use the API as their source of data. Existing versions of the data should be deprecated and agencies should be given up to 6 months to update their code to use the new API(s).

sampowers-usds commented 1 year ago

I made some small grammatical/punctuation edits to the above.

Outstanding question for consideration: Do we want to include guidance on how to treat the data sources that we pull in? Is the aim to be a re-publisher of other organizations' data or are we only interested in publishing data that results from USDS transformations?

vim-usds commented 1 year ago

awesome write up!

travis-newby commented 1 year ago

Outstanding question for consideration: Do we want to include guidance on how to treat the data sources that we pull in? Is the aim to be a re-publisher of other organizations' data or are we only interested in publishing data that results from USDS transformations?

Probably ultimately a question for Kameron, but my $0.02 is that we should not be a republisher of data (we should stick to publishing results and information about how we got to those results).

tpcolson commented 1 year ago

My 2 cents, my agency is wanting a lot of downstream app development using the CEJST output data, and speaking of republishing data, the only way I can deliver is....to republish the CEJST data. Having a REST API exposed (it currently is not) would alleviate that problem. I don't think the question is about the input data (used to create the CEJST output).