Closed roll closed 5 years ago
Added the two Socrata sources named above. I've added publishers but they may not be exactly the ones wanted for production.
Socrata API returns some domain specific key/values in the classification
property, including a domain_metadata
key. Socrata docs describe it as:
an array of domain metadata objects for public custom metadata
Each item is a key/value can be added as custom metadata specific to the domain (https://socratadiscovery.docs.apiary.io/#reference/0/find-by-domain-specific-metadata).
Each domain has the ability to add custom metadata to datasets beyond Socrata’s default metadata. This custom metadata is different for every domain, but within a domain, all assets may be labeled with the metadata. The custom metadata is a named set of key-value pairs. For example one domain might have a set named 'Publication Metadata' and have keys 'Publication Date' and 'Publication Cycle', while another domain has a set named 'Agency Ownership' having key 'Department'). The caller may restrict the results to a particular custom metadata pair by specifying the parameter name as a combination of the set's name and the key's name and the parameter value as the key's value. To construct the parameter name join the set's name to the key's name with an underscore and replace all spaces with dashes.
So, we can't rely on each Socrata instance having the same custom metadata that we can map to Lacounts data fields.
For now, all domain_metadata
items are harvested and added as package extras without further processing.
Similarly, the license
property is available under the metadata
key for Socrata datasets. It's not clear if the provided values are part of a controlled vocabulary. By default in CKAN, license options for a dataset are provided by entries to the Open Licenses Service for a controlled CKAN licenses group (https://licenses.opendefinition.org/licenses/groups/ckan.json). If Socrata metadata.license
is free text it maybe difficult to map between these two properties.
For now, if a metadata.license
is present in the Socrata data, it's added as a package extra.
Some queries
Frequency 1 (Controller)
Frequency 2 (Santa Monica)
Time period (Santa Monica)
Some fixes / tweaks on our side:
harvest_dataset_url
extra should contain the link to the specific dataset page in the source catalog["resource"]["owner"]["display_name"]
value as our contact_name
fieldupdatedAt
and createdAt
fields in our modified
and issued
ones@amercader Please check if we're good now to close this issue: https://lacounts-staging.l3.ckan.io/dataset/calendar-events
Overview
Harvested metadata will obviously have a different schema than the LA Counts site one. We need a way to match fields in the remote metadata to the ones expected in our site.
Or in this Socrata dataset from this site, the value for
response['results']['classification']['domain_metadata']['key'] == 'Data-Freshness_Time-Period'
might be ourtemporal_extent_start
/temporal_extent_end
fields (need to check this).Mapping cheat sheet - https://github.com/okfn/ckanext-lacounts/issues/51#issuecomment-418265966
Tasks