ioos / ckanext-ioos-theme

IOOS Catalog as a CKAN extension
GNU Affero General Public License v3.0
7 stars 14 forks source link

Add some logic to reorder CI_OnlineResource links #97

Closed mwengren closed 7 years ago

mwengren commented 7 years ago

It's possible to tweak the CKAN harvesting code to make certain OnlineResource link xpaths in ISO XML appear before others in the 'Data and Resources' section on a dataset detail page. For example, the SOS GetCaps link is at the bottom of list in this dataset. Ideally, it would be first, above all the others.

There's no cut-and-dry way to do this because links will appear in various xpath locations depending on data provider, interpretation of ISO, etc. But, we can probably improve what we have. For example, prioritizing CI_OnlineResource links in SV_ServiceIdentification sections over others will help (and fix the above in that case).

Here's what I did for the NOAA Catalog:

https://github.com/mwengren/ckanext-spatial/blob/noaa_dev_18f/ckanext/spatial/harvesters/base.py#L410-L416

and https://github.com/mwengren/ckanext-spatial/blob/noaa_dev_18f/ckanext/spatial/model/harvested_metadata.py#L898-L945

It seemed to work pretty well for some metadata sampled from the NOAA Catalog. Can we look into doing something similar or just copying this?

mwengren commented 7 years ago

Forgot to include an example. For this dataset in the NOAA Catalog, it ensures the REST and WMS links are listed before the ancillary HTML informational links.

lukecampbell commented 7 years ago

sure, looks easy. Thanks for the code!

lukecampbell commented 7 years ago

Do you have a preferred order? I found a place where I can update it, but it would be easier for me to make a small tweak here than to create several more ISO keys and integrate them. Also, having a preferred order is more predictable and consistent than XPaths, in my opinion.

mwengren commented 7 years ago

Actually the order might depend on both position in XML (via xpath) as well as link type (ie resource.format). It might work only via link type if our format parsing/detection is pretty good, which it seems to be. The position is a good fallback because web service endpoints are typically found within the SV_ServiceIdentificationInfo sections rather than in some of the other possible spots.

Basically what I was looking to do is to prioritize web service endpoints over general reference links (organization webpages, thesauri links, etc - the extra stuff that's included in ISO but less relevant to the data). Web services types could vary depending on the dataset, but we obviously have SOS, OPeNDAP, ERDDAP, WMS, WCS, WFS, Esri REST, etc). If you can code something along that order (with 'other' stuff in whatever order it was parsed in at the end of the list) in that function, we can give that a shot to start and see how it works.

Another issue though is that the default ckan code doesn't actually match all possible xpaths for OnlineResource links in the XML. That's part of the reason I changed that code in the NOAA Catalog, basically a bug. Here's the same record in Data.gov and NOAA Catalog showing the missing resources:

https://catalog.data.gov/dataset/noaa-national-hurricane-center-tropical-cyclone-forecasts-wms-wfs

https://data.noaa.gov/dataset/noaa-national-hurricane-center-tropical-cyclone-forecasts-wms-wfsf4982

So we may want to make some XPath changes at the same time. I'd have to look into what xpath expressions need to change to fix the matching problem.