ioos / ckanext-ioos-theme

IOOS Catalog as a CKAN extension
GNU Affero General Public License v3.0
7 stars 14 forks source link

IOOS Catalog should differentiate ERDDAP:griddap formats similarly to ERDDAP:tabledap #227

Closed mwengren closed 3 years ago

mwengren commented 4 years ago

IOOS Catalog includes filters for formats of type ERDDAP-TableDAP and ERDDAP currently.

We should add the ability to parse the equivalent ERDDAP-GridDAP format.

ERDDAP datasets include metadata with CI_OnlineResource elements of gmd:protocol=ERDDAP:tabledap, for example from PacIOOS:

https://data.ioos.us/dataset/aloha-cabled-observatory-aco-acoustic-doppler-current-profiler-adcp-velocity, corresponding to ISO XML file:

https://registry.ioos.us/waf/PacIOOS/da9a05ec60da11fc782909557ff5f926a73a14d6.xml

Similarly, this NERACOOS record has the equivalent gmd:protocol=ERDDAP:griddap:

https://data.ioos.us/dataset/bio-ww-iii-latest-forecasts-east-coast0e806 and XML:

https://registry.ioos.us/waf/NERACOOS/WW3_EastCoast_latest_iso19115.xml

Let's figure out where the code differs in treating each type and add the same parsing for the griddap type.

benjwadams commented 4 years ago

Differentiation between resource types for ERDDAP appears to be done using the URL for most endpoints. There is an ERDDAP-griddap in the logic, but it doesn't look like it's being reached.

https://github.com/ioos/catalog-ckan/blob/master/ckanext/ioos_theme/harvesters/ioos_harvester.py#L189-L198

Is there some good ISO metadata that could be used to test the griddap case?

benjwadams commented 4 years ago

https://data.ioos.us/dataset/wavewatch-iii-ww3-mariana-regional-wave-model

If you click the link above and navigate to the ERDDAP resource, it actually is an ERDDAP griddap endpoint. Looking at the metadata, there's "ERDDAP-griddap" in there as well. I'll have to do a one off harvest run of this dataset and see what's occurring in the code.

mwengren commented 4 years ago

@benjwadams Yes, PacIOOS has a number of these. But they all appear as 'res_format=ERDDAP' in API queries.

Here's another example from PacIOOS, but I think they're pretty consistent: https://data.ioos.us/dataset/surface-currents-from-a-diagnostic-model-scud-pacific

Source record: https://registry.ioos.us/waf/PacIOOS/a9c2ee3bb3da2bfef898f295b29b6386966a81bd.xml

mwengren commented 4 years ago

@benjwadams Both the PacIOOS dataset and NERACOOS datasets have similar values for griddap services in the <gmd:protocol> tags. So we should be able to match for that similarly to the way the code currently looks for ERDDAP:tabledap.

NERACOOS (https://data.ioos.us/dataset/bio-ww-iii-latest-forecasts-east-coast0e806) uses the default ERDDAP ISO metadata, whereas PacIOOS (https://data.ioos.us/dataset/surface-currents-from-a-diagnostic-model-scud-pacific) makes their own, but they both have identical elements like the following that allow to distinguish griddap services:

<gmd:protocol>
<gco:CharacterString>ERDDAP:griddap</gco:CharacterString>
</gmd:protocol>

I think we can simplify the whole block of code you linked to above to the following:

            if resource['resource_locator_protocol'] == 'OPeNDAP:OPeNDAP':
                resource['format'] = 'OPeNDAP'
            if resource['resource_locator_protocol'] == 'ERDDAP:tabledap':
                resource['format'] = 'ERDDAP-TableDAP'
            if resource['resource_locator_protocol'] == 'ERDDAP:griddap':
                resource['format'] = 'ERDDAP-GridDAP'

As it is currently, it's too greedy in classifying ERDDAP OPeNDAP:OPeNDAP resources as format: ERDDAP rather than OPeNDAP.

It should work better this way I think, but we'll need to do some testing on the dev Catalog instance first though to view changes.

benjwadams commented 4 years ago

Implemented in https://github.com/ioos/catalog-ckan/commit/ae3f87910b07432ba6dd961b8b52c100c510d18e . Going to see how this works on staging and then deploy to production.

mwengren commented 4 years ago

@benjwadams I think we will want to wipe all of these lines out as well:

https://github.com/ioos/catalog-ckan/blob/ae3f87910b07432ba6dd961b8b52c100c510d18e/ckanext/ioos_theme/harvesters/ioos_harvester.py#L193-L197

These are too greedy about labeling things as ERDDAP or ERDDAP-TableDAP when in fact for ERDDAP-generated ISO records, they should actually just be labeled OPeNDAP instead.

This NERACOOS record is a good example:

https://registry.ioos.us/waf/NERACOOS/WW3_EastCoast_latest_iso19115.xml

Both of the relevant OnlineResource elements actually point to the exact same URL, but one is labeled OPeNDAP:OPeNDAP and the other is ERDDAP:griddap in the <gmd:protocol> elements. Your code will label the first ERDDAP and the second ERDDAP-GridDAP. Can you take a look?

benjwadams commented 4 years ago

Added in feaef0364f8598ff3c8b549a14a47b369d0f1ad0

mwengren commented 3 years ago

Implemented