Closed mwengren closed 3 years ago
The IOOS Catalog UI for GCMD keywords looks great! Thanks for pointing out the metadata shortcoming in our two NANOOS models. Those should be easy fixes.
I'm pinging @cseaton and @crisien, who are responsible for the SELFE and OSU ROMS THREDDS servers, respectively. I'll follow up with them.
@mwengren Both THREDDS servers have now been corrected. The SELFE model was done yesterday, and per our ncISO > NANOOS WAF job, the WAF metadata XMLs were updated at 1am PT. The corresponding IOOS catalog record isn't updated yet, but I assume the change should fully propagate within 24 hours? The OSU ROMS model was done a couple of hours ago. (Thanks so much, @cseaton and @crisien!)
While we're on the topic of keywords and the IOOS Catalog, why are GCMD Keywords and CF standard names largely replicated in the "Freeform Tags" section? I see it both on the model records and the sample glider records you included. Is there a good reason for doing that?
I'll check back on Monday when the updates to both model records should be propagated to the catalog.
@emiliom we have an issue for that!
see: ioos/catalog-ckan#209
I suspect there may be issues with other aspects of the site if they're removed (like filtering in the faceted filter in the side navbar), but @benjwadams may know more about that.
So, it's at least on the list of improvements to consider. Hopefully fixed or closed with no action if it's not feasible by milestone due date at the end of January.
@emiliom I had a look this morning, and it looks like it's not quite there yet.
The ROMS record doesn't look to be changed (not sure if that's an issue somewhere in the harvest pipeline or the source ISO XML didn't receive an updated
For the SELFE model, it looks like the structure of the GCMD keywords section still isn't 100%. They need to be separated into individual
The SELFE record is still presenting them as a single element: https://registry.ioos.us/waf/NANOOS/d9fce8e42f73b586d5a4e388aeed533b4f5d0107.xml, which is why you see the strange hierarchy on the CKAN page: https://data.ioos.us/dataset/cmop-virtual-columbia-river-selfe-f3374ea3. Could be an ncISO issue, but more likely a syntax problem in ncML or netCDF.
Thanks @mwengren. @cseaton and I did more digging on the SELFE model. The ISO XML directly available from his THREDDS server does show the GCMD keywords as individual gmd:keyword elements; see the XML here
However, I feed the NANOOS WAF using stand-alone ncISO (vers 2.3.1), and the resulting ISO XML mushes all the keywords into one element.
I could switch to just grabbing from that thredds xml url, but I see there are other differences in the xml file, so I don't know what's best. I also see that my stand-alone ncISO is old, and I could easily try updating to the latest (2.3.5), though I don't see any references to this issue in the release notes.
What do you recommend?
Update: I just upgraded to the latest stand-alone ncISO (2.3.5) and reran it, but no cigar. GCMD keywords still get combined into a single gmd:keyword element. For what it's worth, ncISO 2.3.5 is nearly a year old, while @cseaton's THREDDS is pretty old, Version 4.3.23 - 20140826.1617.
@kevin-obrien @noaaroland Can we ask for your help with understanding why the ncISO releases Emilio is using here are producing the garbled GCMD keywords?
Here's an example that shows the merged keywords we refer to in this issue: https://data.ioos.us/dataset/cmop-virtual-columbia-river-selfe-f3374ea3 (or see more above). Mostly wondering if there's an issue in the software or something in the source data/ncML formatting.
Thanks. On a side note (relative to the ask about ncISO), I just realized that the OSU ROMS model catalog record @mwengren originally pointed out is not the THREDDS server @crisien manages. I'd forgotten that we have that model available via two servers, and two corresponding catalog records, THREDDS (Craig's) and Hyrax. The Hyrax one is managed by someone else for specific applications. I won't volunteer to look into the GCMD keywords issue on that one yet. The catalog record for Craig's THREDDS server is https://data.ioos.us/dataset/regional-ocean-modeling-system-roms-oregon-coast8447b It, too, has GCMD keyword issues, but, if possible, let's keep this issue focused on just the CMOP SELFE THREDDS record (https://data.ioos.us/dataset/cmop-virtual-columbia-river-selfe-f3374ea3) for now, to minimize confusion & complexity.
As far as I can tell, this questions boils down to which piece of software is responsible splitting the comma separated string of GCMD keywords that are stuffed into the netCDF attribute.
ERDDAP appears to do this and maybe the software building the graphical display might ought to do it also just in case it encounters therein an ISO file with a keyword which is a comma separated string.
That said, since the netCDF "keywords" attribute is always a comma separated list, then the XSLT template can be modified to separate on the commas. See the attached file for an example output from a modified XSL file. example.zip
If this output is good, I will make a release using this proposed new template.
Your sample output from the modified XSL looks like it's doing the job of splitting the GCMD keywords. I didn't check for anything else, though. Thanks!
Wow, thank you!! That was fast. I've downloaded 2.3.6 and changed my nciso script to use that version the next time it runs, overnight. I'll report back.
@emiliom Reviewing open Catalog issues - I still see some GCMD keyword hierarchy issues here: https://data.ioos.us/dataset/cmop-virtual-columbia-river-selfe-f3374ea3.
The source record from the NANOOS WAF looks like the keywords themselves may not be formed correctly:
<gco:CharacterString>
Oceans; Ocean Temperature; Potential Temperature, Oceans; Salinity/Density; Salinity, Oceans; Sea Surface Topography; Sea Surface Height, Oceans; sea_water_potential_temperature ;sea_water_temperature; sea_water_salinity; Ocean Circulation; Ocean Currents; x_sea_water_velocity; y_sea_water_velocity
</gco:CharacterString>
There should be >
characters between the elements in the hierarchy rather than semicolon correct? This looks like a separate problem from the ncISO issue, which I'm not sure whether is resolved or not because it doesn't appear to be separating on the comma either. Wondering if they could be related.
Also, the ;
vs >
issue seems to affect the OSU ROMS record as well: http://data.nanoos.org/metadata/ioos/thredds/thredds_dodsC_NANOOS_OCOS.xml
This http://amb6400b.stccmop.org:8080/thredds/dodsC/model_data/forecast.html seems to have disappeared for the moment so I can't check, but I believe that the files uses >
in netCDF file keywords attribute. At least in the example I have from when I made the change the keywords in the netCDF source all have the entity >
in between instead of a >
character so they come out in the ISO XML as:
`
Thanks @mwengren and @noaaroland! The IOOS Catalog was down for several days a bit over a week ago, so that set me back as far as testing is concerned.
So, to keep the workflow sequence in order:
gcmd:keyword
elementThe issue of >
vs >
encoding in the netcdf files seems like a promising avenue, but I'm still confused why the XML produced by the THREDDS server (presumably by the ncISO plugin?) looks good but the one produced by the latest stand-alone ncISO doesn't.
Again, let's not get bring up the ROMS dataset at this time. That'll just confuse things.
@emiliom If you're able to revisit this issue again, I wanted to point out that we recently resolved the Registry -> Catalog harvesting issue (ioos/catalog-harvest-registry#134), so if you make changes to the TDS/ncISO configuration generating those XML records and issue a reharvest, you should be able to see more or less instantaneous updates on the Catalog side (let us know if you don't!).
cc @benjwadams
Thanks, @mwengren. That'll be helpful in getting to the bottom of this issue. I'll try to get back to it next week.
Following up on this issue.
For reference / follow-ups: IOOS Catalog dataset urls are not persistent in the long term (the catalog doesn't persist them). The urls referenced previously in this issue are no longer valid. The new urls are:
@emiliom This may not be in your purview any longer, but I'm trying to clean up dormant issues for Catalog.
Tracking this down a bit, I believe the records for these datasets originated from this WAF, which is now empty (or filled with zero length files to be specific):
http://data.nanoos.org/metadata/ioos/thredds/
As a result, the Catalog Registry harvests are failing, and one of these two datasets - OSU ROMS still has a metadata date from 2020-02:
https://data.ioos.us/dataset/regional-ocean-modeling-system-roms-oregon-coast
The issue still exists, but there's no way to tell if it's been resolved somewhere in the pipeline without new metadata to harvest. Can someone at NANOOS look into this?
Related point, I'm assuming I should replace Craig as the POC for all NANOOS harvests in the Registry, correct?
I can work with @cseaton and @crisien to help resolve this once and for all. But only after our current nciso fatal issue is resolved.
This issue was resolved by upgrading NANOOS' ncISO execution environment to Java JDK 8.
@emiliom this is mostly an FYI, but we recently added some new UI capabilities for GCMD keywords display in the IOOS Catalog. I happened to be researching RA models in the Catalog and found the SELFE model record (https://data.ioos.us/dataset/cmop-virtual-columbia-river-selfe-f3374ea3).
However, the GCMD keywords in the source metadata don't use the proper syntax of '>' to show the hierarchy. If they do, the result should be more like this: https://data.ioos.us/dataset/uw157-20190916t0000.
Same for at least this other ROMS model: https://data.ioos.us/dataset/regional-ocean-modeling-system-roms-oregon-coast9e82d
If you can fix or cause them to be fixed, let me know and I'll close this out.