ioos / ckanext-ioos-theme

IOOS Catalog as a CKAN extension
GNU Affero General Public License v3.0
7 stars 14 forks source link

Schema.org integration/Search engine indexers #160

Closed mwengren closed 5 years ago

mwengren commented 6 years ago

This is a grab-bag issue as a placeholder to research how we can integrate schema.org metadata into datasets in the IOOS Catalog, and secondly enable/encourage Google and other search indexers to crawl the Catalog (via schema.org metadata or some other means).

Some links: Google research blog on schema.org

Example Data.gov dataset that looks to have some schema.org metadata. (view source and search 'schema.org')

benjwadams commented 5 years ago

We've discussed this earlier in conference calls. Basically, ckanext-dcat should do most of the heavy lifting needed for translating metadata formats.

PS: Above data.gov link appears 404.

benjwadams commented 5 years ago

So I followed the basic install instructions for the ckanext-dcat extension and we now have a number of endpoints available, including RDF:

Catalog: https://dev-catalog.ioos.us/catalog.rdf Individual dataset: https://dev-catalog.ioos.us/dataset/noaa-pibhmc-10-m-bathymetry-cnmi-pagan.xml

See the ckanext-dcat docs for the other formats exposed by the extension. Right now the temporal extent isn't mapping correctly -- it looks like ckanext-dcat expects there to be extras fields named "temporal_start" and "temporal_end", whereas the ckanext-spatial ISO19115 populates extras with "temporal-extent-begin" and "temporal-extent-end". I'm looking into ways of mapping other elements to dcat and hope to have a solution soon.

benjwadams commented 5 years ago

I'm semi-blocked on this until we figure out a scalable way to generate a sitemap.

benjwadams commented 5 years ago

Now that the cached sitemap is enabled, this should be doable.

benjwadams commented 5 years ago

Now appears to be working with Google Dataset Search's parser on the dev site: https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fdev-catalog.ioos.us%2Fdataset%2Fsurface-currents-from-a-diagnostic-model-scud-pacific

mwengren commented 5 years ago

@benjwadams Great! That is excellent.

Where is the Schema.org metadata encoded in the source page for that dataset? Is it using JSON-LD blocks? I can't tell where to find it. I remember hearing somewhere JSON-LD was the best option for encoding/transmitting the Schema.org metadata.

benjwadams commented 5 years ago

https://github.com/ioos/catalog-ckan/pull/199 fixed masking of DCAT endpoints. Closing.