ioos / catalog

IOOS Catalog general repo for documentation and issues
https://ioos.github.io/catalog/
MIT License
2 stars 6 forks source link

NDBC station records, problems with CSW search filtering and SOS GetObservation scheme references #63

Closed emiliom closed 6 years ago

emiliom commented 6 years ago

NDBC station records are presenting several problems on IOOS Catalog CSW searches (and some of those problems are also apparent on the IOOS Catalog application). I suspect these problems are long standing, but I can't say for sure. There are 2 broad problems, with some overlap:

  1. CSW search filtering: a. A CSW search that includes an OGC FES datetime filter invariably fails to produce any NDBC station records. b. For observed properties or parameters, CF standard names are not exposed. A search using CF standard name literals fails in most or possibly all cases. A different, eyeball-friendly (and often machine unfriendly) set of parameter names is exposed via subject terms: Water Temperature, Air Temperature, Ocean Currents, etc. In addition, sensor urn's rather than observedProperty parameter names (which are largely CF standard names) are incorrectly exposed.
  2. The SOS GetObservation sample/template urls returned in the scheme references list has two mistakes that lead to errors when issuing requests: a. observedProperty should use CF standard names, not sensor urn's as shown (8/3/2018 correction: the NDBC SOS accepts parameter names or corresponding mmisw url's, but 4 out of the 9 parameters listed are actually not CF standard names.) b. eventTime iso8601 datetime strings must include a Z (UTC) suffix

See the NDBC SOS documentation page for reference and working examples.

I've prepared a Jupyter notebook illustrating all these problems and solutions (when known). No solution or workaround is known for problem 1.a.

cc @ocefpaf (we've been discussing aspects of this and other CSW querying problems -- eg, #62)

mwengren commented 6 years ago

@emiliom Here's my list of NDBC fixes and issues from ~1.5 years ago. Do any of these seem connected to the issues you've found in the output metadata? IIRC, I also had to add error handling functionality into pyoos when working with the NDBC service because ~100 or so DescribeSensor requests from their service (as listed in GetCapabilities) returned exceptions or HTTP errors - so couldn't even be parsed. At first I thought I would reach out to them to see about fixing it, but then other things happened...

From my notes:

NDBC fixes (review status - Jan 24)

emiliom commented 6 years ago

@mwengren your issue #9 is exactly the same as the my "2" above (which has two parts, just like your issue)! But relative to how you reported it, I'd add these clarifications:

'observedProperty' is a valid parameter, but sensorml2iso is requesting a stationID rather than a parameter - due to content of SensorML in NDBC service not being consistent with others.

FYI, that would be sensorID, not stationID. But this is a minor point. The main issue is that this is wrong. The NDBC GetObservation observedProperty argument takes parameter name strings that seem to be largely CF standard names (I haven't checked the list thoroughly).

'eventTime' is not supported by NDBC and need to handle as an edge case.

eventTime is supported by NDBC! But it seems to require a "Z" UTC suffix after each datetime string. (I don't remember if the i52nsos and ncsos servers are more forgiving and flexible; they probably are)

mwengren commented 6 years ago

@emiliom I took a look into the code again to try to sort out why #9 'observedProperty' parameter is wrong for the NDBC requests, and also compared the DescribeSensor Responses with the NANOOS service (which is of course what I used for reference when building out the module).

Since sensorml2iso is using pyoos and owslib to parse the SOS responses, the code block here:

https://github.com/ioos/sensorml2iso/blob/master/sensorml2iso/sensorml2iso.py#L434-L451

which generates those GetObservation requests, is iterating through the ds.variables (parsed by owslib), and assigning each to the 'observedProperty' value for the GetObs request. I didn't dig into exactly where owslib looks for the DescribeSensor variables but based on looking at one of the NANOOS DescribeSensor examples, it is either in the <sml:OutputList> or under <sml:component>.

Looking at an NDBC DescribeSensor example, it must be sml:component, since that's the only place the URN-style strings for observed params are appearing in their DescribeSensor response.

As to why I made the GetObs requests work that way, I don't really remember, but it would seem that either there's a more appropriate way to get there without using the ds.variables list for observedProperty, or perhaps there's a misconfiguration between NDBC's DescribeSensor and GetObs templates, where it should in fact be working. Will require more digging, or maybe you know the answer to that.

If there's a way to patch the code to work better for both i52N and NDBC to generate that parameter, we can definitely merge it.

Adding a 'Z' to the eventTime parameter seems like it will fix that in NDBC case and still work at least for NANOOS. Should probably test CO-OPS as well.

Also, looking at least at the one NDBC DescribeSensor example I linked above, it does not include time bounds, so this would be why sensorml2iso isn't generating them in the ISO record and you can't filter by time in CS-W. Perhaps there's another way to get that data, apparently I did not get that far...

emiliom commented 6 years ago

@mwengren this looks very promising! Thanks for digging. I'll try to follow in your footsteps early next week to see what I learn and help enhance sensorml2iso to support the NDBC SOS w/o degrading other services.

I believe the NDBC SOS never fully adopted the IOOS SOS profile. The NANOOS SOS being an IOOS 52N SOS (albeit an older version, not the latest), its SensorML conventions should reflect the IOOS SOS profile, not anything NANOOS is doing in a funky way. FYI, we inject data and metadata via sos-injector-db.

emiliom commented 6 years ago

I'm pasting here @mwengren's status summary from a few days ago (via email), b/c it's a very helpful summary. As the ball is on my court at this time, this is mainly a note to self as well as general documentation on the issue (easier to track information here vs email threads).

I think there's potential to make some fixes in sensorml2iso to resolve some of the NDBC SOS issues, but I honestly think some may be beyond reach. To list what you reported in this issue and what I was able to find last week:

  1. Unable to search by datetime filter for NDBC SOS records in Catalog:

    • I think in at least some cases (and perhaps all) this is due to NDBC not reporting eventTime in their DescribeSensor responses, or reporting it in a way that doesn't match the IOOS SOS profile. If they are reporting it in a non-standard way, I probably didn't add code to handle this for the first sensorml2iso release. This could be added pretty easily I think, it would just take an if else block to first try to parse according to the IOOS profile, and if no results, try the NDBC approach.
  2. CF standard names for observed parameters not exposed:

    • I think like you mentioned they are using URN names instead of URI links to the MMISW vocabulary page as specified in the IOOS SOS profile. Didn't fully get to dig into this one.
  3. SOS GetObservation URLs generated by sensorml2iso incorrect for NDBC:

    • This is what I mainly looked at Friday and commented on on the GH issue:
    • For 'observedPropery' parameter being incorrect, I think there's an inconsistency between what NDBC reports in DescribeSensor and what it is expecting in GetObservation. pyoos and owslib are used to parse out a list of 'variable' (s) available at each station/platform (code link here), and then those are passed into the GetObservation URL template, so it's not necessarily within sensorml2iso code to fix, we're just leveraging what pyoos/owslib take as standard SOS behavior. My guess is NDBC isn't consistent between DescribeSensor and GetObservation, but it would take more investigation to prove.
    • For 'eventTime' parameter being incorrect: this can be fixed with adding a 'Z' as you mentioned, hopefully that is a straightforward fix, but should be tested that it works for CO-OPS and other i52N SOSes.
emiliom commented 6 years ago

NDBC SOS DescribeSensor documentation, behavior and examples

Here's the description of an SOS DescribeSensor request, according to the NDBC SOS documentation page:

DescribeSensor - This function returns detailed sensor charactertistics. (Not fully implemented. Returns metadata for DART® stations, but only returns partial metadata for other stations.) Parameters include:

Here's a sample station procedure DescribeSensor response, for urn:ioos:station:wmo:plsf1. The information that's closest to parameters or observed properties is in the sml:components node:

<sml:components>
<sml:ComponentList>
    <sml:component name="urn:ioos:sensor:wmo:plsf1::anemometer1"
      xlink:href="https://sdf.ndbc.noaa.gov/sos/server.php?service=SOS&request=DescribeSensor&version=1.0.0&outputformat=text/xml;subtype="sensorML/1.0.1"&procedure=urn:ioos:sensor:wmo:plsf1::anemometer1"/>
    <sml:component name="urn:ioos:sensor:wmo:plsf1::baro1"
      xlink:href="https://sdf.ndbc.noaa.gov/sos/server.php?service=SOS&request=DescribeSensor&version=1.0.0&outputformat=text/xml;subtype="sensorML/1.0.1"&procedure=urn:ioos:sensor:wmo:plsf1::baro1"/>
    <sml:component name="urn:ioos:sensor:wmo:plsf1::airtemp1"
      xlink:href="https://sdf.ndbc.noaa.gov/sos/server.php?service=SOS&request=DescribeSensor&version=1.0.0&outputformat=text/xml;subtype="sensorML/1.0.1"&procedure=urn:ioos:sensor:wmo:plsf1::airtemp1"/>
</sml:ComponentList>
</sml:components>

However, it doesn't provide the CF standard names that an NDBC SOS GetObservation request requires for use with the observedProperty argument. Instead, what it provides is sensor urn's, such as urn:ioos:sensor:wmo:plsf1::airtemp1, and the SOS DescribeSensor request url that uses these urn's as sensor procedures. The response has no sml:components node, but it does have a parameter name under sml:identifier name="longName": AirTemperature. Still not a CF standard name, but much closer.

In the same NDBC SOS Documentation page, GetObservation requests are described as accepting the following observedProperty parameter values: air_pressure_at_sea_level, air_temperature, currents, sea_floor_depth_below_sea_surface, sea_water_electrical_conductivity, sea_water_salinity, sea_water_temperature, waves, winds. Most of these are CF standard names, except the ones I'm showing in italics.

So, DescribeSensor responses don't provide the observedProperty information needed to construct a GetObservation request. Instead, this information would have to be pulled from the GetCapabilities entry for the station procedure, specifically from the sos:observedProperty elements. For the urn:ioos:station:wmo:plsf1example, that's under the sos:ObservationOffering gml:id="station-plsf1 node and looks like this:

<sos:observedProperty xlink:href="http://mmisw.org/ont/cf/parameter/air_temperature"/>
<sos:observedProperty xlink:href="http://mmisw.org/ont/cf/parameter/air_pressure_at_sea_level"/>
<sos:observedProperty xlink:href="http://mmisw.org/ont/cf/parameter/winds"/>

Note that the observedProperty parameter in the GetObservation request actually accepts these url's as well as the simple (non-url) parameter name strings listed in the documentation, which are just the last component of these url's.

emiliom commented 6 years ago

After going over previous comments, the NDBC SOS docs and responses, and the sensorml2iso code, I think all these problems are resolvable within sensorml2iso. But it'll take a good chunk of effort.

Briefly, the least disruptive strategy will be to implement the sos_type argument for "ndbc", and use it to extract station['starting'], station['ending'] and station['variables'] from the GetCapabilities offering instead of SensorML. The GetCapabilities response is already available in sensorml2iso in the sosgc object.

For problem 2.b ("b. eventTime iso8601 datetime strings must include a Z (UTC) suffix"), we simply add the Z suffix in https://github.com/ioos/sensorml2iso/blob/master/sensorml2iso/sensorml2iso.py#L412 and https://github.com/ioos/sensorml2iso/blob/master/sensorml2iso/sensorml2iso.py#L422 if sos_type == "ndbc". It's very likely that a Z suffix won't impact other SOS types, but adding it only for ndbc would be the most cautious approach.

mwengren commented 6 years ago

@emiliom It's very out of date now, but I was looking at the ioos/sensorml2iso module a minute ago for something else and noticed I already had a branch with some commits towards adding sos_type param capability. I would start over with a new branch rather than try to update this one, but there may be a few code bits that could be extracted:

https://github.com/mwengren/sensorml2iso/tree/ndbc_sos_handling

Specific commit: https://github.com/mwengren/sensorml2iso/commit/51bea07141152ff79a926bb13eedb08514ded221

emiliom commented 6 years ago

@mwengren I've applied all the changes I was planning, and tested on my local dev code using this command (on one station):

sensorml2iso -s http://sdf.ndbc.noaa.gov/sos/server.php --stations=urn:ioos:station:wmo:plsf1 --sos_type=ndbc --response_formats='text/csv'

Here's the ISO XML it generated. Can you take a look to double check that this XML record would solve the problems we've discussed? I think it does.

I haven't looked at your old ndbc_sos_handling branch yet. I implemented a pretty simple sos_type handling scheme that just tests for "ndbc" to do a few special processing steps, and handles all other sos_type entries (including the default None) the same way as before.

emiliom commented 6 years ago

I've just pushed the code changes to the "ndbc" branch on my fork, https://github.com/emiliom/sensorml2iso/tree/ndbc. Take a look if you'd like. I plan to do a bit more testing before submitting a PR.

mwengren commented 6 years ago

@emiliom the record looks ok from a visual inspection and the GetObservation link appears to work, so hopefully it resolves all the issues you found? It looks to me to resolve the timeExtent and proper GetObs URL at least, didn't check the others.

Can you run against the full NDBC SOS and make sure it completes successfully and doesn't error out? We need the master branch to be able to produce a full WAF for Catalog, so it's essential to test all stations in the SOS in case there are edge cases your code might miss. There will be errors for stations that failed in the logs if you run it with --verbose flag, but some are expected because not all their DescribeSensor URLs are functional (errors internal to their service), but it should dump out a folder with ~1800 XML records if successful. It will print a list of failed stations at the end of the log output with --verbose on.

I looked at your code and looks good, but you may want to check out this commit to add the sos_type parameter read to command_line.py since I didn't see that in your branch yet - but maybe you just haven't pushed that part yet.

Let's make a PR once the full WAF test is done (as well as a test on another full non-NDBC service as well)!

emiliom commented 6 years ago

@mwengren Thanks for your review! Great.

Some quick follow ups:

so hopefully it resolves all the issues you found? It looks to me to resolve the timeExtent and proper GetObs URL at least, didn't check the others.

Yup, as far as I can tell it resolves all the issues I identified.

I looked at your code and looks good, but you may want to check out this commit to add the sos_type parameter read to command_line.py since I didn't see that in your branch yet - but maybe you just haven't pushed that part yet.

While I haven't looked at that commit yet, the sos_type parameter read was already in the code base master branch. It just wasn't being actively used. See https://github.com/ioos/sensorml2iso/blob/master/sensorml2iso/command_line.py#L51

I'll work on fuller testing next, hopefully today.

mwengren commented 6 years ago

Ah, ok, I didn't look at the master branch, assumed it was only in my NDBC test branch and hadn't already been folded into master. Looking at two year old code might as well be like you never wrote it!

emiliom commented 6 years ago

Can you run against the full NDBC SOS and make sure it completes successfully and doesn't error out? We need the master branch to be able to produce a full WAF for Catalog, so it's essential to test all stations in the SOS in case there are edge cases your code might miss. There will be errors for stations that failed in the logs if you run it with --verbose flag, but some are expected because not all their DescribeSensor URLs are functional (errors internal to their service), but it should dump out a folder with ~1800 XML records if successful. It will print a list of failed stations at the end of the log output with --verbose on.

I ran it on the NDBC SOS w/o selecting a specific set of stations, using this statement:

sensorml2iso --sos_type=ndbc -s http://sdf.ndbc.noaa.gov/sos/server.php --response_formats='text/csv' --verbose

It generated 1,008 station ISO XML files (not 1,800). The log file is pretty big (7.5 MB), mainly due to the DescribeSensor XML being copied there. Near the beginning, after listing all stations, it lists 23 stations that I assume did not generate an ISO XML, and had this error message:

SOS DescribeSensor error returned for: <station urn>, skipping. Error msg: No data found for this station

These are listed again towards the end, preceded by this message: "Stations in 'failures' list (should match DescribeSensor errors):"

I haven't looked at the XML files.

I also ran it against the NANOOS SOS, and everything seems fine. I only did a quick inspection of one of the XML files, and it looked fine.

mwengren commented 6 years ago

@emiliom Great, that sounds normal. Like I mentioned NDBC SOS has errors and invalid DescribeSensor URLs.

Actually the number you got is probably the right count. I had taken the dataset count for NDBC from the Catalog, which we've been having duplicated dataset problems with, so no doubt there are several duplicates for NDBC floating around in there. We're hoping an upgrade to CKAN resolves the problem.

I think we can merge your PR, I'll take a look.

mwengren commented 6 years ago

@emiliom I think we fixed all of these with your PR? Going to close this. If there's anything outstanding, let me know and will reopen.

emiliom commented 6 years ago

Yeah, I think everything in the scope of this issue was covered. Thanks.

emiliom commented 6 years ago

Copying from https://github.com/ioos/catalog/issues/69#issuecomment-416267475, b/c the fixes are still not visible in the catalog or its pyCSW endpoint.

Regarding the NDBC fixes from https://github.com/ioos/sensorml2iso/issues/26 and #63 (fixed on sensorml2iso by PR https://github.com/ioos/sensorml2iso/pull/27), @benjwadams said on 8/23:

I've updated the version of sensorml2iso running on the Catalog server to 1.0.5.

But I'm not seeing the fixes reflected in either the Catalog (eg, https://data.ioos.us/dataset/delaware-bay-26-nm-southeast-of-cape-may-nj1) or via the CSW query in my sample gist at https://gist.github.com/emiliom/ea4ae7e6fe051d8cac2788015e60e0cf#file-iooscatalog_csw_oldtemporalextent-ipynb

mwengren commented 6 years ago

@emiliom This is the timeExtent issue for NDBC services you are referring to here? I don't see a time extent in that record either, but it's possible the information isn't included in all the NDBC stations. But, if it's in fact widespread, I'm not sure why the timeExtent is not being displayed in Catalog, I thought that was fixed in the 1.0.4 version of sensorml2iso.

emiliom commented 6 years ago

I'm referring to all the NDBC-specific problems address addressed in https://github.com/ioos/sensorml2iso/pull/27:

I have a hunch why these fixes are not showing up: maybe @benjwadams didn't update the NDBC sensorml2iso execution statement to use the new(ish) argument --sos_type=ndbc?

mwengren commented 6 years ago

Ah yes, that would do it, I forgot to tell @benjwadams to do that part! Luckily, it should be an easy fix to add that to the Docker config.json file we're using to generate those WAFs, it's already parsed correctly.

emiliom commented 6 years ago

Great. Let me know once it's confirmed that the Docker config.json file has been changed and the changes are visible on the Catalog. Not that I have anything riding on the outcome; I'm just looking forward to the mental "done, check off" for this thing :wink:

mwengren commented 6 years ago

@benjwadams Have you changed the sernsorml2iso Docker config settings on data.ioos.us for the NDBC WAF to include --sos-type=ndbc for that job? If not, can we do that to hopefully close this issue out this week along with the new release? A quick check showed issues still with the NDBC records.

benjwadams commented 6 years ago

Hmm, thought I had commented on this issue. The sensorml2iso container is running with the updated version and the following crontab:

0 * * * * app sensorml2iso -s "http://sdf.ndbc.noaa.gov/sos/server.php" --response_formats "text/csv,text/xml;schema="ioos/0.6.1"" --output_dir "/srv/iso/ndbc" --sos_type ndbc 2>&1 | /usr/bin/logger -t sensorml2iso
10 * * * * app sensorml2iso -s "http://opendap.co-ops.nos.noaa.gov/ioos-dif-sos/SOS" --response_formats "text/csv,text/xml;subtype="om/1.0.0/profiles/ioos_sos/1.0"" --output_dir "/srv/iso/co-ops" 2>&1 | /usr/bin/logger -t sensorml2iso
20 * * * * app sensorml2iso -s "http://sos.glos.us/52n/sos/kvp" --output_dir "/srv/iso/glos" 2>&1 | /usr/bin/logger -t sensorml2iso

So it should be running with the updated code/flag for the sdf.ndbc.noaa.gov source.

emiliom commented 6 years ago

It looks like the problems with NDBC records are now resolved! At least on the IOOS Catalog, based the one example record I've been using. Woo-hoo!

@mwengren hopefully you can do more testing. And hopefully that'll confirm that NDBC records are now first class citizens in the IOOS Catalog, including CSW queries. In that case, you can close this issue.

mwengren commented 6 years ago

I think based on a quick inspection of some of the NDBC SOS datasets in the Catalog, this can be called resolved. I didn't dig into any CS-W results, but since there was a separate issue to fix the CS-W sync process I think we can assume that CKAN -> pycsw sync is working properly again. Closing this one.