project-open-data / project-open-data.github.io

Open Data Policy — Managing Information as an Asset
https://project-open-data.cio.gov/
Other
1.34k stars 584 forks source link

identifier and isPartof fields #501

Closed AnasGhadieh closed 9 years ago

AnasGhadieh commented 9 years ago

@rebeccawilliams @philipashlock

My question is about the bullet below:

Each agency shall enrich their EDI and PDL by ensuring all data assets include the individual datasets within by using the identifier and isPartof fields. See examples and more details at: https://project-open-data.cio.gov/v1.1/collections/


If we have a data set that has 15 years worth of data (15 files). By following the above requirement, would those still count as 15 data sets on data.gov or just as 1 data set ?

Please let me know if my inquiry isn't clear .

rebeccawilliams commented 9 years ago

@AnasGhadieh, for Data.gov metric purposes, this will appear as "dataset" which includes collections. Data.gov today has 157,899 DATASETS which includes collections. If files within those collections were included in this count, there would be over 1 million records on Data.gov today. (For example, this is a large collection: http://catalog.data.gov/dataset?collection_package_id=da679621-ccd6-49aa-a6db-75fc791a4833)

For open data policy evaluation purposes, both datasets and total distribution URLs are measured separately under Public Data Listing, amongst other things:

In other words, technically your "dataset" count will go down, but individual files are purposefully counted as well to get the full picture on data growth.

Let me know if you have any other questions!