project-open-data / project-open-data.github.io

Open Data Policy — Managing Information as an Asset
https://project-open-data.cio.gov/
Other
1.34k stars 583 forks source link

For "modified" -- need way to indicate continually updated data assets #225

Closed mereastew closed 10 years ago

mereastew commented 10 years ago

In old Data.gov we could put things like "present" and "Monday through Friday, except federal holidays" to indicate resources that were continually updated. We need a solution for continually modified and web resources that can be updated frequently, and even daily. Current guidance says only YYYY-MM-DD, so the modified date is out of date once we publish.

gbinal commented 10 years ago

This is also at issue with the Temporal field - I opened #244 to that end.

gbinal commented 10 years ago

As mentioned in #244, it would be helpful if ISO 8601 had an answer for how to represent 'today' or the like.

Otherwise, my suggestion would be for agencies just to do the best they can and just recognize that the public data listing is a snapshot and will be as up to date as its most recent update, which should be ~monthly. One thing that helps this is the requirement to include a record in the data catalogs of the actual catalog itself. Therefor, it'll be understandable for their not to be a more recent Modified date than the Modified date of that specific record.

gbinal commented 10 years ago

Okay - As I've been digging into the ISO 8601 lessons of issue #244, I wonder if the role of 'durations' is what's needed for these scenarios. Let's say that the issue is that a dataset is 'modified' daily and the agency is trying to come up with an appropriate way of documenting an accurate entry for the 'modified' field. Instead of holding strictly to the YYYY-MM-DDThh:mm:ss.sTZD format of ISO 8601, perhaps the correct entry here would just be P1D - to indicate that it's being updated daily.

Any thoughts?

mhogeweg commented 10 years ago

these continunously updated datasets are also affected by the notion of harvesting the dcat.json from an agency at a set frequency. whenever the agency catalog is harvested, the description of the dataset is outdated, unless there is a way to state that the most current date of the dataset is 'now'.

The FGDC has an elegant way to deal with this in their metadata spec where they combine 'free text' values like now or 'continuous' with more strict formatting like the ISO 8601 describes. See: http://www.fgdc.gov/metadata/csdgm/09.html#Ending%20Date.

konklone commented 10 years ago

I guess this is late to suggest this, but if you're trying to represent start and end dates, and it's possible something may not have ended yet, it seems like this should really be two fields, start and end, and then if end is null, it hasn't ended.

gbinal commented 10 years ago

I've made a pull request to clarify this and give examples and am closing this issue.