ioos / registry

Getting data services registered in the IOOS Service Registry
http://ioos.github.io/registry/
2 stars 7 forks source link

Register NANOOS OHSU-CMOP model forecasts from THREDDS #69

Closed emiliom closed 9 years ago

emiliom commented 9 years ago

We'd like to register a new service & dataset from NANOOS.

This service is based on the irregular-grid SELFE model. Please bear with us if we didn't get something correctly set on THREDDS, for irregular grids or their associated metadata. We're hoping that issues will be readily exposed during the registration process, and we can use that information to correct first-order problems.

Thanks!

robragsdale commented 9 years ago

@emiliom Registered service today. I will follow-up if I see any issues in the harvesting process.

emiliom commented 9 years ago

Thanks, @robragsdale. The service is not on the Catalog yet, or in the NANOOS collection on the NGDC GeoPortal . I can see it on the NGDC submitted records for NANOOS. @amilan17, could you let us know if there are problems with our THREDDS service that we should address? For example, I can see that the "title" should probably be changed to something more meaningful, but I would imagine that's not a blocking problem. Thanks!

robragsdale commented 9 years ago

@emiliom I updated the status to approve. It should start moving now and follow this harvesting schedule where it will be in the Geoportal tomorrow and the Catalog by Thursday.

amilan17 commented 9 years ago

@emiliom The resulting iso metadata currently does not pass iso validation, because the source netCDF file is missing resolution values. These are the ISO validation messages: http://www.ngdc.noaa.gov/docucomp/page?xml=test/NOAA/IOOS/NANOOS/iso/reports/IsoValidationReport.xml&view=isoValidationErrorsReport&custom=default&title=test/NOAA/IOOS/NANOOS%20Invalid%20Records

If you add the resolutions attributes for the following variables - the resulting ISO will pass validation and get published to the catalog.

vert_coords -> geospatial_vertical_resolution node_lat - > geospatial_lat_resolution node_lon -> geospatial_lon_resolution

emiliom commented 9 years ago

Thanks, @robragsdale and @amilan17. @cseaton, please follow up with Paul on Anna's feedback and let us know when the changes have been made. This would be a good time to also edit the "title" (identifiers) to something more meaningful and stand-alone (eg, "CMOP SELFE model forecasts of Columbia Estuary"). Currently "model_data/forecast.nc" (or "Model data/forecast.nc") appears (see here) under gmd:fileIdentifier and gmd:identificationInfo/ ... /gmd:title. See OSU ROMS as a good example to follow, since you have Craig's THREDDS catalog file.

emiliom commented 9 years ago

@cseaton has looked into @amilan17's comments, but couldn't figure out how to best address the issues Anna raised. Here are Charles' comments (in an email to me), followed by my own comments and updates at the bottom:

I can't find any good examples of what the correct values should be for the resolution attributes that Anna Milan pointed out. For vertical resolution, should the resolution be based on the raw model resolution (and if so, on the maximum resolution or the minimum vertical resolution), or should it be based on the fact that we are only supplying surface and bottom values (in which case, I'm not sure what the appropriate value would be). For lat and lon resolution, I'm not clear if this should equal the minimum grid spacing, or something else. Checking the SURA unstructured grid datasets in the IOOS catalog, these attributes seem to be absent. Checking the OSU structured grid values (from here: http://www.ngdc.noaa.gov/metadata/published/NOAA/IOOS/NANOOS/iso_u/xml/thredds_dodsC_NANOOS_OCOS.xml), vertical resolution is set to Nil, so we can follow that practice.

The best description I can find for these attributes is here: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/metadata/DataDiscoveryAttConvention.html#geospatial_lat_resolution_Attribute but that basically gives no information. I find nothing in the CF standard name table relating to _resolution, so I'm not sure what standard those are defined under.

I also don't see where the _resolution attributes are defined in the OSU ROMS files. I don't find it in the global attributes of the nc file, or in the catalog.xml or other files that I have from Craig.

Note that @cseaton refers to OSU ROMS b/c that's a NANOOS THREDDS-hosted model that's already registered and in the IOOS Catalog. The CMOP-OHSU SELFE model is irregular-grid, so it's different from OSU ROMS.

Also note that in the link Anna provided, the record links shown are now broken. But the CMOP SELFE record has apparently moved into the production section of EMMA:

Anna, can you give us some guidance? Is there an expert in irregular-grid conventions we can consult for this, or is this a simpler issue? Thanks!

emiliom commented 9 years ago

@amilan17, have you had a chance to digest this? Your advice would be really helpful to us in moving forward.

@rsignell-usgs: Sorry for (ab)using your well-rounded knowledge and bringing you into this. Any ideas on resolution values to use to address Anna's point regarding the CMOP SELFE irregular-grid THREDDS service?

If you add the resolutions attributes for the following variables - the resulting ISO will pass validation and get published to the catalog. vert_coords -> geospatial_vertical_resolution node_lat - > geospatial_lat_resolution node_lon -> geospatial_lon_resolution

See my comments above, quoting @cseaton on where CMOP is stuck in trying to identify appropriate values.

Thanks all!

@robragsdale: I hope you're on paternity leave! Congrats on being a new dad! I don't expect to hear from you for a while longer.

amilan17 commented 9 years ago

@emiliom @cseaton @robragsdale I think the easy solution is for ncISO to recognize the fact that there is no resolution documented in the netCDF file (for whatever reason) for that dimension and to NOT try to include this information in the resulting ISO. The resolution field is optional in this ISO section. The trade-off is that the units for that dimension will also be removed.

MD_Dimension dimensionName MD_DimensionNameTypeCode: vertical dimensionSize: 18 resolution: (units:m)

emiliom commented 9 years ago

Thanks, @amilan17. Are you saying that ncISO should be updated? If so, that's not something we can do, so we (NANOOS) still have to find a solution that's within our grasp. Based on your comments and @cseaton's assessment, how about if we just set to Nil the three coordinate resolution variables: vert_coords, node_lat and node_lon? I assume CMOP can do this in either the netcdf files or via THREDDS/ncml.

amilan17 commented 9 years ago

I created a pull request https://github.com/Unidata/threddsIso/pull/8 for NcISO to be updated.

rsignell-usgs commented 9 years ago

Just looking at this dataset for the first time (sorry!). I like to test by trying to load with pyugrid, which only works if the ugrid conventions are followed. Here's an example: http://nbviewer.ipython.org/gist/rsignell-usgs/cf400cd3825f80a38890

I can see there is a problem with the time units. Looking at http://amb6400b.stccmop.org:8080/thredds/dodsC/model_data/forecast.nc.html it looks like this: 2-18-2015 5-10-36 pm and you can see that the units of time are just seconds, which is not CF compliant.

But if you just renamed copied the value of base_date to units you should be all set. While you are in there, why not add a global attribute Conventions=UGRID-0.9

rsignell-usgs commented 9 years ago

One more thing: nciso will calculate the lon/lat bounds (by reading the actual lon/lat data and computing the range) if you specify the global attribute cdm_data_type: ugrid

emiliom commented 9 years ago

Thanks, @rsignell-usgs! @cseaton, can you take Rich's suggestions and my earlier comment from last week, discuss with Paul, and try to implement the changes?

@amilan17, thanks for creating a pull request to ncISO! I'm assuming it would take a much longer while before that loop could be closed so we could benefit from it with this particular service we're trying to register. So, we (NANOOS/CMOP) should try to implement the CF & metadata changes on our end, regardless.

cseaton commented 9 years ago

Thanks, @rsignell-usgs, particularly for pointing out the error in how we are representing the time units. We will implement these changes.

rsignell-usgs commented 9 years ago

I just noticed a few more things. The depth and water level variables should not have the positive attribute defined. The positive attribute should only be used in the vertical coordinate variable that has the "formula_terms" attribute set (here vert_coords). vert_coords should have units of 1, not meters.

By the way, I would recommend not specifying the resolution. We don't specify resolution on other modeling datatsets and they pass the metadata validation AOK.

Check out, for example, this SELFE dataset that has no problem being harvested: http://comt.sura.org/thredds/dodsC/data/comt_1_archive/inundation_tropical/VIMS_SELFE/Hurricane_Ike_2D_final_run_with_waves.html

You could also use this as an example of a UGRID compliant SELFE dataset

emiliom commented 9 years ago

Thanks, @rsignell-usgs. BTW, after the first NGDC registration results @amilan17 said that adding those resolution attributes would lead to passing ISO validation. Thanks so much for the VIMS SELFE example, that will be very helpful; @cseaton had looked at SURA irregular-grid examples on the catalog (possibly this one) and also hadn't found that those attributes were present..

emiliom commented 9 years ago

FYI: The service is now showing on the IOOS Catalog. @cseaton and I now have some improvements to make so the service and dataset are more understandable (eg, the title) and more standard compliant; but it looks like those incremental changes will just show up as part of the regular harvesting.

Thanks so much to all (@amilan17, @rsignell-usgs, @robragsdale) for your help! @robragsdale, you can close this issue if you want. We may reuse it later to ask for technical advice, but as the service is now registered and in the catalog, the core original goal has been met.

emiliom commented 9 years ago

@cseaton has addressed most of the remaining issues with this NANOOS THREDDS service. However, to do that he had to create a new catalog xml endpoint service url. That means we have to ask you to change the service url from the old one: http://amb6400b.stccmop.org:8080/thredds/catalog/model_data/catalog.xml to this one: http://amb6400b.stccmop.org:8080/thredds/forecast_model_data.xml

@robragsdale and @amilan17, I hope that change is not too onerous on your end, on the EMMA > GeoPortal Registry > IOOS Catalog workflow.

Thanks for your help!

robragsdale commented 9 years ago

@emiliom @cseaton @amilan17 I have submitted the new service URL to the registry. I will check in the morning and change the status to "approve".

rsignell-usgs commented 9 years ago

@cseaton, nice job with the changes to make this forecast dataset UGRID compliant. It seems to load just fine now with pyugrid:

http://nbviewer.ipython.org/gist/rsignell-usgs/ed68c3daa55ece317afe

cseaton commented 9 years ago

@rsignell-usgs, thanks! That's nice to see. I haven't implemented the changes you mentioned about the positive attribute and the vert_coords unit yet, but we plan to implement that as part of some currently in-progress changes to how SELFE writes netcdf output files, to bring us into better compliance. @robragsdale, thanks for submitting the new service URL for us.

emiliom commented 9 years ago

@robragsdale and @rsignell-usgs, thanks for your follow-ups. I can see that starting today the new THREDDS catalog endpoint is now on the IOOS Catalog, listed as a new service. The corresponding dataset listing is looking great, WAY better than before -- richer metadata, more standards compliant, etc; thanks, @cseaton!

The only obvious issue I see is that on its IOOS Catalog dataset page, the map doesn't show a bounding box. I don't know if that's a problem with the IOOS Catalog, or on the THREDDS end; it looks like it's the former, b/c a bounding box is shown on the map on the corresponding NGDC Geoportal page. BTW, that bounding box looks wrong (offset north, too much on land) on that map, but it's possible that's a projection issue on the NGDC Geoportal. @cseaton, could you double check if the bounding box values are correct on your end?

@robragsdale, given that a bounding box is shown on Geoportal, can you follow up with the IOOS Catalog gang to see if there's something funky on their end? Also, the old service endpoint is still listed under NANOOS. Can you go ahead and remove it from the registry?

Thanks, all.

dpsnowden commented 9 years ago

@emiliom the lack of a bounding box on the catalog is due to the issue described in the messages section of http://catalog.ioos.us/datasets/54f807a48c0db358a4d164c6. The Paegan library that does the featureType parsing doesn't handle the unstructured grid data type apparently. Not sure where that is on the development schedule or if we'll be abandoning Paegan in favor of something else for various reasons, including but not limited to UGRID support. I don't have a satisfying answer right now but we should open an issue on the catlaog repo to display UGRID bounding boxes or better yet, bounding polygons.

@lukecampbell, comments? Correct me if I'm wrong.

emiliom commented 9 years ago

Nice of you to drop by, @dpsnowden! I hope you're enjoying the weather; we'd love to have some of your snow for our mountains (yours too, Rich). And thanks for the useful comment.

If @lukecampbell can confirm your assessment, we'll conclude we're done with registering this service endpoint, and this github issue can be closed (once the previous, now deprecated endpoint is removed from the registry). There's always more polish to do, and @cseaton already has a list of a few improvements, but that'll happen incrementally.

emiliom commented 9 years ago

Just checking on the status of the removal of the old endpoint. On EMMA it's showing as flagged "FOR REMOVAL" (thanks, @robragsdale), but not "REMOVED". On the NGDC Geoportal, the record is still there, so consequently it's still there on the IOOS Catalog. @amilan17, I assume the remaining step is a manual one on your end?

We also haven't heard from @lukecampbell confirming that the lack of a bounding box on the IOOS Catalog dataset is due to Paegan limitations with UGRID handling. A brief confirmation would be helpful.

Thanks to all. Just want to round up the last steps so we can fully finish this endpoint registration and close this issue.

amilan17 commented 9 years ago

@emiliom You are correct - it is a manual process. I will proceed to set up the removal configurations. They should be cleared out by tomorrow afternoon. Let me know!

emiliom commented 9 years ago

Thanks, @amilan17. I'll confirm tomorrow late afternoon.

emiliom commented 9 years ago

Ok, the old, deprecated service has been flushed out of the system (EMMA (set to REMOVED) > NGDC Geoportal > IOOS Catalog). The production service looks just fine on the IOOS Catalog, other than the bounding box issue already discussed and that's not a problem on the NANOOS end.

I'm closing this issue. Thanks again to everyone who helped out!