project-open-data / project-open-data.github.io

Open Data Policy — Managing Information as an Asset
https://project-open-data.cio.gov/
Other
1.35k stars 583 forks source link

Public Inventory #71

Closed johnwonderlich closed 11 years ago

johnwonderlich commented 11 years ago

Section 2 of the implementation guide:

http://project-open-data.github.io/implementation-guide/

...lays out procedures for creating the public index of agency datasets. As it is currently written, the agencies "are only required to list datasets with an “Access Level” value of “public,”".

The schema defines "public" and restricted in the following manner:

http://project-open-data.github.io/schema/

"Choices: Public (is or could be made publicly available), Restricted (available under certain conditions),"

These definitions do not adequately define "public" or "restricted."

If restricted data can be made available under certain conditions, it should be able to be listed publicly. It may be that "restricted" means "able to be released to subsets of the public, like a qualified research community, under certain conditions." If this is the intent behind the "restricted" category, that should be made clear (it's unclear what "under certain conditions" means). Even if that's the intent of the "restricted" definition, that's a form of being made public (the "could" from the public category), and should result in those datasets being listed in the public data index.

In other words, "public" and "restricted" should be better defined, and the requirement that agencies list all of their data that "could" be made public should be applied to both "public" and "restricted" datasets, as they've been defined.

The guidance should provide details in the following form:

"Datasets are considered datasets that "could" be made publicly available if: certain information would need to be removed from the dataset before release significant resources would need to be allocated to digitize or prepare the information for release the data can only be released to a limited community due to privacy concerns an extraction process can create a new dataset on top of the current dataset to provide public value etc"

Additionally, data that is affirmatively marked as "private" should not be automatically withheld from public listing; even if an agency determines that a dataset cannot be released publicly, that is a different determination from deciding whether to publicly acknowledge the dataset's existence.

Ultimately, the labels that determine whether datasets get publicly listed should be designed based on whether it's possible to acknowledge the dataset's existence publicly, which is a different decision from whether it's possible to release it, regardless of how much extraction, transformation, digitization, or anonymization is necessary to do it. Since the current OMB directive says that the public data index should include all data that "could" be made public, that "could" should be defined clearly, and empower public oversight of agency information policy decisions.

mhogeweg commented 11 years ago

+1

As opposed to the current approach to describe public/restricted why not describe the conditions for use (‘use constraints’) or access (‘access constraints’) as defined by FGDC.

Access constraints: http://www.fgdc.gov/metadata/csdgm/01.html#Access%20Constraints.

Use constraints: http://www.fgdc.gov/metadata/csdgm/01.html#Use%20Constraints.

Both are free text with one suggested value ‘None’.

M

konklone commented 11 years ago

To pull out what I think is a main point of @johnwonderlich's - the OMB directive says the public inventory should explicitly include data that may not yet be marked as "public", but Project Open Data doesn't reflect this as explicitly as it probably should.

MarinaNitze commented 11 years ago

I was pausing on responding to this comment until the official new implementation guidance came out. It provides clearer guidance on the accessLevel field. It also updates the terminology slightly, making the three choices "public" "restricted public" and "non-public." There will also be an accessLevelComment field where you can (for "restricted public") explain how to gain access or (for "non-public") log the reason why the dataset cannot be released.

It's important to note that since the public inventory only requires an agency to include "public" (not "restricted public" or "non-public") datasets, this field can be used for internal purposes only or shared publicly as an agency sees fit.