Closed emiliom closed 6 years ago
@emiliom Here's my list of NDBC fixes and issues from ~1.5 years ago. Do any of these seem connected to the issues you've found in the output metadata? IIRC, I also had to add error handling functionality into pyoos when working with the NDBC service because ~100 or so DescribeSensor requests from their service (as listed in GetCapabilities) returned exceptions or HTTP errors - so couldn't even be parsed. At first I thought I would reach out to them to see about fixing it, but then other things happened...
From my notes:
NDBC fixes (review status - Jan 24)
NaN Geographic extent values passed from SensorML
Eventtime exception in GetObservation: Issue Link
<ows:Exception exceptionCode="InvalidParameterValue" locator="eventTime">
<ows:ExceptionText>2016-12-13T02:35:50/2016-12-13T04:35:50</ows:ExceptionText>
</ows:Exception>
Variables use URNs rather than mmisw URLs (which are parsed by sensorml2iso currently by ‘split(‘/’)’.
Alternative: check for “:” in string, split by that instead if found, otherwise default to ‘/’ for split
Some stations record position in SensorML via below, need to accommodate:
<sml:position name="stationPosition">
<swe:Position>
<swe:location>
<swe:Vector gml:id="stationLocation" definition="urn:ogc:def:property:OGC:location">
<swe:coordinate name="latitude">
<swe:Quantity axisID="Y">
<swe:uom code="deg"/>
<swe:value>50.17000</swe:value>
</swe:Quantity>
</swe:coordinate>
<swe:coordinate name="longitude">
<swe:Quantity axisID="X">
<swe:uom code="deg"/>
<swe:value>171.84000</swe:value>
</swe:Quantity>
</swe:coordinate>
<swe:coordinate name="altitude">
<swe:Quantity axisID="Z">
<swe:uom code="m"/>
<swe:value>0</swe:value>
</swe:Quantity>
</swe:coordinate>
</swe:Vector>
</swe:location>
</swe:Position>
@mwengren your issue #9 is exactly the same as the my "2" above (which has two parts, just like your issue)! But relative to how you reported it, I'd add these clarifications:
'observedProperty' is a valid parameter, but sensorml2iso is requesting a stationID rather than a parameter - due to content of SensorML in NDBC service not being consistent with others.
FYI, that would be sensorID, not stationID. But this is a minor point. The main issue is that this is wrong. The NDBC GetObservation observedProperty argument takes parameter name strings that seem to be largely CF standard names (I haven't checked the list thoroughly).
'eventTime' is not supported by NDBC and need to handle as an edge case.
eventTime
is supported by NDBC! But it seems to require a "Z" UTC suffix after each datetime string. (I don't remember if the i52nsos and ncsos servers are more forgiving and flexible; they probably are)
@emiliom I took a look into the code again to try to sort out why #9 'observedProperty' parameter is wrong for the NDBC requests, and also compared the DescribeSensor Responses with the NANOOS service (which is of course what I used for reference when building out the module).
Since sensorml2iso is using pyoos and owslib to parse the SOS responses, the code block here:
https://github.com/ioos/sensorml2iso/blob/master/sensorml2iso/sensorml2iso.py#L434-L451
which generates those GetObservation requests, is iterating through the ds.variables
(parsed by owslib), and assigning each to the 'observedProperty' value for the GetObs request. I didn't dig into exactly where owslib looks for the DescribeSensor variables
but based on looking at one of the NANOOS DescribeSensor examples, it is either in the <sml:OutputList>
or under <sml:component>
.
Looking at an NDBC DescribeSensor example, it must be sml:component, since that's the only place the URN-style strings for observed params are appearing in their DescribeSensor response.
As to why I made the GetObs requests work that way, I don't really remember, but it would seem that either there's a more appropriate way to get there without using the ds.variables
list for observedProperty
, or perhaps there's a misconfiguration between NDBC's DescribeSensor and GetObs templates, where it should in fact be working. Will require more digging, or maybe you know the answer to that.
If there's a way to patch the code to work better for both i52N and NDBC to generate that parameter, we can definitely merge it.
Adding a 'Z' to the eventTime
parameter seems like it will fix that in NDBC case and still work at least for NANOOS. Should probably test CO-OPS as well.
Also, looking at least at the one NDBC DescribeSensor example I linked above, it does not include time bounds, so this would be why sensorml2iso isn't generating them in the ISO record and you can't filter by time in CS-W. Perhaps there's another way to get that data, apparently I did not get that far...
@mwengren this looks very promising! Thanks for digging. I'll try to follow in your footsteps early next week to see what I learn and help enhance sensorml2iso to support the NDBC SOS w/o degrading other services.
I believe the NDBC SOS never fully adopted the IOOS SOS profile. The NANOOS SOS being an IOOS 52N SOS (albeit an older version, not the latest), its SensorML conventions should reflect the IOOS SOS profile, not anything NANOOS is doing in a funky way. FYI, we inject data and metadata via sos-injector-db.
I'm pasting here @mwengren's status summary from a few days ago (via email), b/c it's a very helpful summary. As the ball is on my court at this time, this is mainly a note to self as well as general documentation on the issue (easier to track information here vs email threads).
I think there's potential to make some fixes in sensorml2iso to resolve some of the NDBC SOS issues, but I honestly think some may be beyond reach. To list what you reported in this issue and what I was able to find last week:
Unable to search by datetime filter for NDBC SOS records in Catalog:
CF standard names for observed parameters not exposed:
SOS GetObservation URLs generated by sensorml2iso incorrect for NDBC:
Here's the description of an SOS DescribeSensor
request, according to the NDBC SOS documentation page:
DescribeSensor - This function returns detailed sensor charactertistics. (Not fully implemented. Returns metadata for DART® stations, but only returns partial metadata for other stations.) Parameters include:
Here's a sample station procedure DescribeSensor response, for urn:ioos:station:wmo:plsf1
. The information that's closest to parameters or observed properties is in the sml:components
node:
<sml:components>
<sml:ComponentList>
<sml:component name="urn:ioos:sensor:wmo:plsf1::anemometer1"
xlink:href="https://sdf.ndbc.noaa.gov/sos/server.php?service=SOS&request=DescribeSensor&version=1.0.0&outputformat=text/xml;subtype="sensorML/1.0.1"&procedure=urn:ioos:sensor:wmo:plsf1::anemometer1"/>
<sml:component name="urn:ioos:sensor:wmo:plsf1::baro1"
xlink:href="https://sdf.ndbc.noaa.gov/sos/server.php?service=SOS&request=DescribeSensor&version=1.0.0&outputformat=text/xml;subtype="sensorML/1.0.1"&procedure=urn:ioos:sensor:wmo:plsf1::baro1"/>
<sml:component name="urn:ioos:sensor:wmo:plsf1::airtemp1"
xlink:href="https://sdf.ndbc.noaa.gov/sos/server.php?service=SOS&request=DescribeSensor&version=1.0.0&outputformat=text/xml;subtype="sensorML/1.0.1"&procedure=urn:ioos:sensor:wmo:plsf1::airtemp1"/>
</sml:ComponentList>
</sml:components>
However, it doesn't provide the CF standard names that an NDBC SOS GetObservation request requires for use with the observedProperty
argument. Instead, what it provides is sensor urn's, such as urn:ioos:sensor:wmo:plsf1::airtemp1
, and the SOS DescribeSensor request url that uses these urn's as sensor procedures. The response has no sml:components
node, but it does have a parameter name under sml:identifier name="longName"
: AirTemperature. Still not a CF standard name, but much closer.
In the same NDBC SOS Documentation page, GetObservation requests are described as accepting the following observedProperty
parameter values: air_pressure_at_sea_level, air_temperature, currents, sea_floor_depth_below_sea_surface, sea_water_electrical_conductivity, sea_water_salinity, sea_water_temperature, waves, winds. Most of these are CF standard names, except the ones I'm showing in italics.
So, DescribeSensor responses don't provide the observedProperty information needed to construct a GetObservation request. Instead, this information would have to be pulled from the GetCapabilities entry for the station procedure, specifically from the sos:observedProperty
elements. For the urn:ioos:station:wmo:plsf1
example, that's under the sos:ObservationOffering gml:id="station-plsf1
node and looks like this:
<sos:observedProperty xlink:href="http://mmisw.org/ont/cf/parameter/air_temperature"/>
<sos:observedProperty xlink:href="http://mmisw.org/ont/cf/parameter/air_pressure_at_sea_level"/>
<sos:observedProperty xlink:href="http://mmisw.org/ont/cf/parameter/winds"/>
Note that the observedProperty parameter in the GetObservation request actually accepts these url's as well as the simple (non-url) parameter name strings listed in the documentation, which are just the last component of these url's.
After going over previous comments, the NDBC SOS docs and responses, and the sensorml2iso
code, I think all these problems are resolvable within sensorml2iso
. But it'll take a good chunk of effort.
Briefly, the least disruptive strategy will be to implement the sos_type
argument for "ndbc", and use it to extract station['starting'], station['ending'] and station['variables'] from the GetCapabilities
offering instead of SensorML. The GetCapabilities response is already available in sensorml2iso
in the sosgc
object.
For problem 2.b ("b. eventTime iso8601 datetime strings must include a Z (UTC) suffix"), we simply add the Z suffix in https://github.com/ioos/sensorml2iso/blob/master/sensorml2iso/sensorml2iso.py#L412 and https://github.com/ioos/sensorml2iso/blob/master/sensorml2iso/sensorml2iso.py#L422 if sos_type == "ndbc"
. It's very likely that a Z suffix won't impact other SOS types, but adding it only for ndbc would be the most cautious approach.
@emiliom It's very out of date now, but I was looking at the ioos/sensorml2iso module a minute ago for something else and noticed I already had a branch with some commits towards adding sos_type
param capability. I would start over with a new branch rather than try to update this one, but there may be a few code bits that could be extracted:
https://github.com/mwengren/sensorml2iso/tree/ndbc_sos_handling
Specific commit: https://github.com/mwengren/sensorml2iso/commit/51bea07141152ff79a926bb13eedb08514ded221
@mwengren I've applied all the changes I was planning, and tested on my local dev code using this command (on one station):
sensorml2iso -s http://sdf.ndbc.noaa.gov/sos/server.php --stations=urn:ioos:station:wmo:plsf1 --sos_type=ndbc --response_formats='text/csv'
Here's the ISO XML it generated. Can you take a look to double check that this XML record would solve the problems we've discussed? I think it does.
I haven't looked at your old ndbc_sos_handling
branch yet. I implemented a pretty simple sos_type
handling scheme that just tests for "ndbc" to do a few special processing steps, and handles all other sos_type
entries (including the default None) the same way as before.
I've just pushed the code changes to the "ndbc" branch on my fork, https://github.com/emiliom/sensorml2iso/tree/ndbc. Take a look if you'd like. I plan to do a bit more testing before submitting a PR.
@emiliom the record looks ok from a visual inspection and the GetObservation link appears to work, so hopefully it resolves all the issues you found? It looks to me to resolve the timeExtent and proper GetObs URL at least, didn't check the others.
Can you run against the full NDBC SOS and make sure it completes successfully and doesn't error out? We need the master branch to be able to produce a full WAF for Catalog, so it's essential to test all stations in the SOS in case there are edge cases your code might miss. There will be errors for stations that failed in the logs if you run it with --verbose
flag, but some are expected because not all their DescribeSensor
URLs are functional (errors internal to their service), but it should dump out a folder with ~1800 XML records if successful. It will print a list of failed stations at the end of the log output with --verbose
on.
I looked at your code and looks good, but you may want to check out this commit to add the sos_type
parameter read to command_line.py
since I didn't see that in your branch yet - but maybe you just haven't pushed that part yet.
Let's make a PR once the full WAF test is done (as well as a test on another full non-NDBC service as well)!
@mwengren Thanks for your review! Great.
Some quick follow ups:
so hopefully it resolves all the issues you found? It looks to me to resolve the timeExtent and proper GetObs URL at least, didn't check the others.
Yup, as far as I can tell it resolves all the issues I identified.
I looked at your code and looks good, but you may want to check out this commit to add the sos_type parameter read to command_line.py since I didn't see that in your branch yet - but maybe you just haven't pushed that part yet.
While I haven't looked at that commit yet, the sos_type
parameter read was already in the code base master branch. It just wasn't being actively used. See https://github.com/ioos/sensorml2iso/blob/master/sensorml2iso/command_line.py#L51
I'll work on fuller testing next, hopefully today.
Ah, ok, I didn't look at the master branch, assumed it was only in my NDBC test branch and hadn't already been folded into master. Looking at two year old code might as well be like you never wrote it!
Can you run against the full NDBC SOS and make sure it completes successfully and doesn't error out? We need the master branch to be able to produce a full WAF for Catalog, so it's essential to test all stations in the SOS in case there are edge cases your code might miss. There will be errors for stations that failed in the logs if you run it with --verbose flag, but some are expected because not all their DescribeSensor URLs are functional (errors internal to their service), but it should dump out a folder with ~1800 XML records if successful. It will print a list of failed stations at the end of the log output with --verbose on.
I ran it on the NDBC SOS w/o selecting a specific set of stations, using this statement:
sensorml2iso --sos_type=ndbc -s http://sdf.ndbc.noaa.gov/sos/server.php --response_formats='text/csv' --verbose
It generated 1,008 station ISO XML files (not 1,800). The log file is pretty big (7.5 MB), mainly due to the DescribeSensor XML being copied there. Near the beginning, after listing all stations, it lists 23 stations that I assume did not generate an ISO XML, and had this error message:
SOS DescribeSensor error returned for: <station urn>, skipping. Error msg: No data found for this station
These are listed again towards the end, preceded by this message: "Stations in 'failures' list (should match DescribeSensor errors):"
I haven't looked at the XML files.
I also ran it against the NANOOS SOS, and everything seems fine. I only did a quick inspection of one of the XML files, and it looked fine.
@emiliom Great, that sounds normal. Like I mentioned NDBC SOS has errors and invalid DescribeSensor URLs.
Actually the number you got is probably the right count. I had taken the dataset count for NDBC from the Catalog, which we've been having duplicated dataset problems with, so no doubt there are several duplicates for NDBC floating around in there. We're hoping an upgrade to CKAN resolves the problem.
I think we can merge your PR, I'll take a look.
@emiliom I think we fixed all of these with your PR? Going to close this. If there's anything outstanding, let me know and will reopen.
Yeah, I think everything in the scope of this issue was covered. Thanks.
Copying from https://github.com/ioos/catalog/issues/69#issuecomment-416267475, b/c the fixes are still not visible in the catalog or its pyCSW endpoint.
Regarding the NDBC fixes from https://github.com/ioos/sensorml2iso/issues/26 and #63 (fixed on sensorml2iso by PR https://github.com/ioos/sensorml2iso/pull/27), @benjwadams said on 8/23:
I've updated the version of sensorml2iso running on the Catalog server to 1.0.5.
But I'm not seeing the fixes reflected in either the Catalog (eg, https://data.ioos.us/dataset/delaware-bay-26-nm-southeast-of-cape-may-nj1) or via the CSW query in my sample gist at https://gist.github.com/emiliom/ea4ae7e6fe051d8cac2788015e60e0cf#file-iooscatalog_csw_oldtemporalextent-ipynb
@emiliom This is the timeExtent issue for NDBC services you are referring to here? I don't see a time extent in that record either, but it's possible the information isn't included in all the NDBC stations. But, if it's in fact widespread, I'm not sure why the timeExtent is not being displayed in Catalog, I thought that was fixed in the 1.0.4 version of sensorml2iso.
I'm referring to all the NDBC-specific problems address addressed in https://github.com/ioos/sensorml2iso/pull/27:
I have a hunch why these fixes are not showing up: maybe @benjwadams didn't update the NDBC sensorml2iso
execution statement to use the new(ish) argument --sos_type=ndbc
?
Ah yes, that would do it, I forgot to tell @benjwadams to do that part! Luckily, it should be an easy fix to add that to the Docker config.json file we're using to generate those WAFs, it's already parsed correctly.
Great. Let me know once it's confirmed that the Docker config.json file has been changed and the changes are visible on the Catalog. Not that I have anything riding on the outcome; I'm just looking forward to the mental "done, check off" for this thing :wink:
@benjwadams Have you changed the sernsorml2iso Docker config settings on data.ioos.us for the NDBC WAF to include --sos-type=ndbc
for that job? If not, can we do that to hopefully close this issue out this week along with the new release? A quick check showed issues still with the NDBC records.
Hmm, thought I had commented on this issue. The sensorml2iso container is running with the updated version and the following crontab:
0 * * * * app sensorml2iso -s "http://sdf.ndbc.noaa.gov/sos/server.php" --response_formats "text/csv,text/xml;schema="ioos/0.6.1"" --output_dir "/srv/iso/ndbc" --sos_type ndbc 2>&1 | /usr/bin/logger -t sensorml2iso
10 * * * * app sensorml2iso -s "http://opendap.co-ops.nos.noaa.gov/ioos-dif-sos/SOS" --response_formats "text/csv,text/xml;subtype="om/1.0.0/profiles/ioos_sos/1.0"" --output_dir "/srv/iso/co-ops" 2>&1 | /usr/bin/logger -t sensorml2iso
20 * * * * app sensorml2iso -s "http://sos.glos.us/52n/sos/kvp" --output_dir "/srv/iso/glos" 2>&1 | /usr/bin/logger -t sensorml2iso
So it should be running with the updated code/flag for the sdf.ndbc.noaa.gov source.
It looks like the problems with NDBC records are now resolved! At least on the IOOS Catalog, based the one example record I've been using. Woo-hoo!
@mwengren hopefully you can do more testing. And hopefully that'll confirm that NDBC records are now first class citizens in the IOOS Catalog, including CSW queries. In that case, you can close this issue.
I think based on a quick inspection of some of the NDBC SOS datasets in the Catalog, this can be called resolved. I didn't dig into any CS-W results, but since there was a separate issue to fix the CS-W sync process I think we can assume that CKAN -> pycsw sync is working properly again. Closing this one.
NDBC station records are presenting several problems on IOOS Catalog CSW searches (and some of those problems are also apparent on the IOOS Catalog application). I suspect these problems are long standing, but I can't say for sure. There are 2 broad problems, with some overlap:
See the NDBC SOS documentation page for reference and working examples.
I've prepared a Jupyter notebook illustrating all these problems and solutions (when known). No solution or workaround is known for problem 1.a.
cc @ocefpaf (we've been discussing aspects of this and other CSW querying problems -- eg, #62)