Open cew821 opened 10 years ago
@cew821 Glad the JSON schema is useful. I just issued a pull request (https://github.com/project-open-data/project-open-data.github.io/pull/172) to project open data to get the JSON schema in as a common format that we'll express the Common Core Metadata requirements in.
Just an FYI, in addition to automatically validating JSON (see http://dwcaraway.github.io/podschema/validate.html) the schema can be used to generate a form automatically (see http://dwcaraway.github.io/podschema/form.html) which can easily be hooked to a database and can pull in the latest JSON schema from project-open-data so it's always up-to-date.
Thanks. I'm also seeing this. I definitely think that this is a significant resource but I'm not sure if the best use of time is to fix each of these elements or focus on alternate paths like building off of Dave's schema.
@benbalter - any thoughts on this?
To update, below is a sample of an output. It looks like the issue of parsing into arrays comes into play with 'keyword', 'theme', and 'references'; but also there's a related issue of how 'distribution' work correctly with this. I'm not sure if the best move is to address them in conjunction or if that's mixing up too much logic.
[
{
"title": "data 1 ",
"description": "what it is",
"keyword": "key1, key2",
"modified": "2012-01-15",
"publisher": "GSA",
"contactPoint": "John Smith",
"mbox": "john.smith@gsa.gov",
"identifier": "gsa-1123",
"accessLevel": "public",
"accessLevelComment": "In order to access this dataset, visit 123 washington st. ",
"bureauCode": "011:22",
"programCode": "011:111",
"accessURL": "http://www.agency.gov/data.xml",
"webService": "http://www.agency.gov/data.json",
"format": "application/xml",
"license": "CC-0",
"spatial": "United States",
"temporal": "2011",
"theme": "energy, education",
"dataDictionary": "http://www.agency.gov/data/data.html",
"dataQuality": "true",
"accrualPeriodicity": "monthly",
"distribution": "notsurewhattoput?",
"landingPage": "http://www.agency.gov/data_this",
"language": "en-US",
"PrimaryITInvestmentUII": "12-121234121",
"references": "http://www.agency.gov/data.pdf, http://www.agency.gov/otherhub/data.doc",
"issued": "2012-01-22",
"systemOfRecords": "http://www.agency.gov/oira/data-record.html"
}
]
Charles, it seems to me that the only proactive problem with the file generation is the issue of comma separated v. array of strings for keywords, themes, and references. I have split that off as a specific issue #21. Do you think I'm missing anything else crucial? [e.g., I think that changing the date format for an end user to a proper date format would be good but is not essential.]
There are a few additional fields that need to be in arrays of strings, not strings, regardless of how many items are in the array. These include:
bureauCode
programCode
keywords
theme
references
language
Also, dataQuality
needs to be a boolean, not a string, i.e. true
not "true"
.
The generator makes every field "standard JSON" i.e.
{ "keywords":"this, that, the other" }
This is not compliant with the standard, which has more specific requirements for how to represent the objects. For example:
{ "keywords": ["this", "that", "the other"] }
See @dwcaraway's helpful schema: https://github.com/dwcaraway/podschema/blob/master/schema/schema.json
Because the JSON generated by this tool isn't in the right format, I'm not sure it will be that useful? I guess better than nothing.
I wonder if the generator could be made to make better output? Specifically:
I can try to help with this, but I'm having a hard time figuring where in the library this is done. I'm a little familiar with Backbone, but not enough to quickly identify where "the work" of processing the input into JSON is happening. Can you point me in the right direction?