Closed philipashlock closed 9 years ago
This has started to be implemented with https://github.com/GSA/project-open-data-dashboard/issues/83 such that the following example below (taken from the existing example) would pass the schema validation
Note that the particular exemption reason denoted by "B3" in the example used [[REDACTED-EX B3]]
might not make sense in some of the places it's used. More generally, the places where the redactions are used in this example might not make sense given the descriptions used in the other fields. The example here is intended only to demonstrate what the redaction text would look like in the JSON syntax.
It's worth considering whether some fields might never need to be redacted, eg (accessLevel
, identifier
, isPartOf
, bureauCode
, programCode
). With a traditional redacted paper document, I imagine the page numbers are never redacted, even if the full page is. Similarly, it seems like it would be necessary to retain the identifier
even if everything else was redacted so that you could at least distinguish between different redacted records.
{
"@context": "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld",
"@id": "http://www.agency.gov/data.json",
"@type": "dcat:Catalog",
"conformsTo": "https://project-open-data.cio.gov/v1.1/schema",
"describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
"dataset": [
{
"@type": "dcat:Dataset",
"accessLevel": "non-public",
"accrualPeriodicity": "R/P1Y",
"bureauCode": [
"018:10"
],
"conformsTo": "http://www.agency.gov/widget-taxonomy/",
"contactPoint": {
"@type": "vcard:Contact",
"fn": "Jane Doe",
"hasEmail": "mailto:jane.doe@agency.gov"
},
"describedBy": "http://www.agency.gov/datasets/widgets-dictionary.html",
"dataQuality": true,
"description": "This dataset provides national statistics on the production of widgets for [[REDACTED-EX B4]]",
"distribution": [
{
"@type": "dcat:Distribution",
"description": "[[REDACTED-EX B4]] widgets data as a CSV file",
"downloadURL": "[[REDACTED-EX B4]]",
"format": "CSV",
"mediaType": "text/csv",
"title": "[[REDACTED-EX B4]]-widgets.csv"
}
],
"identifier": "https://metadata.agency.gov/10.7927/H4PZ56R2",
"issued": "2011-11-22",
"keyword": [
"widget",
"manufacturing",
"factory"
],
"landingPage": "http://agency.gov/widgets/data",
"language": [
"en-US"
],
"license": null,
"modified": "2011-11-19T12:00:00Z",
"primaryITInvestmentUII": "021-006227212",
"programCode": [
"018:001"
],
"publisher": {
"@type": "org:Organization",
"name": "Widget Services",
"subOrganizationOf": {
"@type": "org:Organization",
"name": "Office of Widget Statistics"
}
},
"references": [
"https://agency.gov/docs/widgets-1.html",
"https://agency.gov/docs/widgets-2.html"
],
"rights": "This dataset cannot be made public because it includes trade secrets and commercial or financial information obtained from a person and is privileged or confidential.",
"spatial": "United States",
"systemOfRecords": "http://www.agency.gov/widgets/sorn/",
"temporal": "2009-09-01T12:00:00Z/2010-05-31T12:00:00Z",
"theme": [
"manufacturing"
],
"title": "U.S. Widget Statistics for [[REDACTED-EX B4]]"
}
]
}
I think all fields should be redacted with a presumption of openness. This is inline with the federal FOIA policy. An example that reflects this would be useful too.
Including the presumption of openness language (above) and DOT's PDL as a best practice would be good additions to this guidance as well.
Thanks, guys, and great example @philipashlock. I'd also note that certain parts of a field can be redacted rather than the whole field, if only certain words are subject to FOIA exemption. Agree with @rebeccawilliams on the presumption of openness. Think agencies should not redact entire metadata records and that there may be some fields that would never make sense to be redacted.
Greetings all -- As a foreign assistance agency, USAID is exempt from releasing data per the seven principled exceptions outlined in OMB 12-01 (see Attachment 1, page 4).
When we issued our open data policy, this is the guidance we provided to our staff for justifying exemptions. Our FOIA office agrees that these do not conflict with the FOIA act, but I wanted to flag this issue so that we can adopt an approach that keeps both documents in mind. Thanks.
@bpushed I believe that still means USAID needs to express those exemptions in the form of redacted JSON, with individualized determinations for each field and catalog entry.
Thanks. That is essentially our plan. For the Sunlight Foundation FOIA request, we were asked specifically to use FOIA exemptions but would plan to revert to OMB 12-01 moving forward.
We are only planning on redacting (if any) on the PDL and leaving the EDI with the full description. This will increasingly become more difficult to manage without some additional metadata tags to automate generating PDL vs EDI. However if you add additional metadata tags for redaction, the simplicity of the POD Schema would be lost.
Is there an equivalent way to do inline tags on text in a JSON fields like in xml? For example:
"description": "<Redacted type='exb4'>Non Public Title</> widgets data as a CSV file"
The only equivalent way I can think of to do this in json is:
"description_redacted": "[[REDACTED-EX B4]] Non Public Title widgets data as a CSV file"
"description": "Non Public Title widgets data as a CSV file"
This would needlessly complicate the schema. Could Agencies submit both the PDL and EDI redacted?
@bbrotsos Following up on this thread -- PDLs @ /data.json should include non-public
datasets including any required redactions. If redactions are present, an unredacted copy must also be submitted to OMB Max.
I think that was clear, but wanted to record that in this issue. Closing this issue as guidance is live: https://project-open-data.cio.gov/redactions/
New issues or pull requests to clarify that guidance are encouraged though.
@bbrotsos For what it's worth, this is what we're going to try for inventory.data.gov - https://github.com/GSA/enterprise-data-inventory/issues/182#issuecomment-128514823
The general guidance on redactions for federal agencies is as follows, but we need to provide examples of what this looks like as JSON.