ioos / notebooks_demos

Notebook demonstrations and examples
https://ioos.github.io/notebooks_demos/
MIT License
19 stars 19 forks source link

Try reading CO-OPS data from ERDDAP for light swim notebook #200

Closed rsignell-usgs closed 6 years ago

rsignell-usgs commented 7 years ago

At some point we should try finding and reading data for the light swim notebook using the new CO-OPS ERDDAP endpoint.

Some pydap examples of reading the CO-OPS endpoint are here: https://github.com/dnowacki-usgs/notebooks/blob/master/noaa_coops_erddap.ipynb

ocefpaf commented 7 years ago

Here is an attempt to use an all catalog workflow to find those ERDDAP endpoints:

https://gist.github.com/ocefpaf/be4f1605894a84254120881c9c84d0dc

Here is a summary of the endpoints found for a 9 day "Boston Light Swim"-like search:

URL: https://data.ioos.us/csw
URL: https://gamone.whoi.edu/csw

In [25]:
dap_urls
Out[25]:
['http://oos.soest.hawaii.edu/thredds/dodsC/pacioos/hycom/global',
 'http://thredds.secoora.org/thredds/dodsC/SECOORA_NCSU_CNAPS.nc',
 'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/204p1_rt.nc',
 'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/162p1_rt.nc',
 'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/166p1_rt.nc',
 'http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_FVCOM_OCEAN_BOSTON_FORECAST.nc',
 'http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_FVCOM_OCEAN_SCITUATE_FORECAST.nc',
 'http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc',
 'http://geoport-dev.whoi.edu/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd',
 'http://www.neracoos.org/thredds/dodsC/UMO/DSG/SOS/A01/Accelerometer/HistoricRealtime/Agg.ncml',
 'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/179p1_rt.nc',
 'http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_FVCOM_OCEAN_MASSBAY_FORECAST.nc',
 'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/231p1_rt.nc',
 'http://oos.soest.hawaii.edu/thredds/dodsC/hioos/satellite/dhw_5km',
 'http://thredds.secoora.org/thredds/dodsC/G1_SST_GLOBAL.nc',
 'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/201p1_rt.nc',
 'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/036p1_rt.nc']

In [26]:
sos_urls
Out[26]:
['http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/036p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
 'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/166p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
 'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/162p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
 'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/204p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
 'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/179p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
 'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/231p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
 'http://www.neracoos.org/thredds/sos/UMO/DSG/SOS/A01/Accelerometer/HistoricRealtime/Agg.ncml?service=SOS&version=1.0.0&request=GetCapabilities',
 'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/201p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities']

In [24]:
erddap_urls
Out[24]:
[]

We should find NDBC 44013 and NOS.CO-OPS 8443970 there, but all we got is NDBC 44029 (http://www.neracoos.org/thredds/sos/UMO/DSG/SOS/A01/Accelerometer/HistoricRealtime/Agg.ncml).

And many cdip.ucsd moorings too :-/

There is no temperature data in the endpoint for 44029 that we got from the catalog, but the buoy does record temperature data, see http://www.ndbc.noaa.gov/data/realtime2/44029.ocean

Also, something is not OK with the search b/c that buoy should be outside of the search bounding box. See cells [12]-[15] of http://nbviewer.jupyter.org/gist/ocefpaf/be4f1605894a84254120881c9c84d0dc

rsignell-usgs commented 7 years ago

@ocefpaf , so there are 4 or 5 issues here, right? @jbosch-noaa, who is our contact on catalog issues now that Luke has moved on?

ocefpaf commented 7 years ago

@ocefpaf , so there are 4 or 5 issues here, right?

Yeah, I should've broken down into smaller parts to make this issue more useful :grimacing:

Here are the issue with the https://data.ioos.us/csw catalog we found in this search:

1) cdip.ucsd stations shouldn't be found xref: https://github.com/SECOORA/skill_score/issues/197; 2) missing buoy NDBC 44013; 3) found NDBC 44029 but that is actually out of the search bounding box; 4) the OPeNDAP endpoint for NDBC 44029 does not have the temperate data; 5) could not find any ERDDAP endpoint, but I am pretty sure there is at least one. 6) cannot find NOS.CO-OPS 8443970 station even from the time when the temperature sensor was active (before Jun 17 2017 4:40PM);

Pinging @mwengren and @jbosch-noaa here for help.

ocefpaf commented 7 years ago

The notebook, http://nbviewer.jupyter.org/gist/ocefpaf/be4f1605894a84254120881c9c84d0dc, connects all the issues above with actual code.

Right now the notebook is useless b/c https://data.ioos.us/csw seems to be down (ping @mwengren), but I will re-run it as soon as the catalog is back online.

mwengren commented 7 years ago

@ocefpaf The CS-W service looks to be working now. Not sure if there was an outage earlier or why.

@ericmbernier and @benjwadams are the ASA contacts for Catalog issues and support going forward, and I'll still be managing it for IOOS.

At the moment, we're in a development freeze though until new funding works its way through the system.

But, we should be able to troubleshoot critical issues if they are clearly identified. I haven't looked at all the notebook links above, but first thing ASA might be able to confirm is if the CKAN to pycsw database sync process is running properly, if it seems the content isn't current. If not this, then we might have to get into finer grained content issues in the CS-W service if you're saying it's not correct.

ocefpaf commented 7 years ago

Thanks for the info @mwengren! I updated the notebook above now that the catalog is back online.

ocefpaf commented 7 years ago

Just tried this one again, see the notebook here, and I have good news and bad news...

The good news are that the NOAA.NOS.CO-OPS stations are showing up again when looking for SOS :tada:

The bad news are that the extraneous cdip.uscd are not gone and a few new ones from edu_fau and edu_usf are now showing up, see cell [9]. Also, still no ERDDAP endpoints are found when using identifier='ERDDAP:tabledap'. (Cell [7])

It is quite awkward to filter a "valid" results from the catalog search. Note that this issue has been plaguing us for a few years now: https://github.com/SECOORA/skill_score/issues/197

Not sure if my filtering is completely off or if these issues need to solved at the data provider metadata level.

ocefpaf commented 6 years ago

@jbosch-noaa we still do not get any ERDDAP endpoints correctly when filtering with the identifier 'ERDDAP:tabledap'. Here is a notebook demonstrating this:

http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/ocefpaf/7d3be0e99f6e990eda42e76a39ca171f/raw/b782f47e8a9bbb3ffa13779b0d9d506e1bdc4621/issue-200-ERDDAP.ipynb

@rsignell-usgs any suggestion or should we abandon the idea of looking for ERDDAP endpoints with the catalog?

jbosch-noaa commented 6 years ago

@ocefpaf - In talking to @mwengren , we postulated that SECOORA may not have their ERDDAP server in the catalog because they have over 2300 data sets!

Perhaps there is another RA that has model data we can use as an example instead of SECOORA? It doesn't mean we give up on this notebook. You can just hard code the path and find a different model to work with for another notebook.

Another option is to talk to @kwilcox about manually moving some ERDDAP files over to their WAF so they get registered in the catalog.

Thoughts?

mwengren commented 6 years ago

@ocefpaf There are many ERDDAP services registered in the Catalog, but none published by SECOORA per Jen's comment. See: https://data.ioos.us/dataset?res_format=ERDDAP-TableDAP. There may be some disconnect between how pycsw is filtering the 'ERDDAP-TableDAP' endpoints compared to how CKAN represents them.

We do attempt to identify and tag them at the CKAN level as in the above. I am not sure what pycsw is doing wrt ERDDAP.

ocefpaf commented 6 years ago

@ocefpaf There are many ERDDAP services registered in the Catalog, but none published by SECOORA per Jen's comment.

The search in the example is for the Boston area, so SECOORA data availability is not an issue there. However, b/c we are planning more model-observation comparison notebooks, it is nice to know that SECOORA ERDDAP data won't show up.

See: https://data.ioos.us/dataset?res_format=ERDDAP-TableDAP. There may be some disconnect between how pycsw is filtering the 'ERDDAP-TableDAP' endpoints compared to how CKAN represents them.

Not sure what are the implications of that, sorry I am not an expert on pycsw and CKAN.

We do attempt to identify and tag them at the CKAN level as in the above. I am not sure what pycsw is doing wrt ERDDAP.

I'll try to do a "reverse" search example using https://data.ioos.us/dataset?res_format=ERDDAP-TableDAP so we can have control of the endpoints that should show up. More on this soon...

kwilcox commented 6 years ago

Is there a way to register an ERDDAP server with the catalog?

mwengren commented 6 years ago

Yes, you can register the ERDDAP WAF directly in the Harvest Registry (there is an 'ERDDAP WAF' type as it differs from a plain old HTTP WAF. I assumed you were holding off on registering the SECOORA ERDDAP due to the proliferation of 'datasets' in there. But, register away, the more the merrier I suppose.

mwengren commented 6 years ago

@ocefpaf I guess my point is that there are many ERDDAP services in the Catalog, they just don't seem to be obtainable via pycsw, I guess. We'll have to investigate more.

ocefpaf commented 6 years ago

they just don't seem to be obtainable via pycsw

Sorry. I am confused. Do you mean OWSLib's CatalogueServiceWeb class I used in the example to obtain the endpoints? pycsw is a CSW server implementation, right?

Note that I did find some ERDDAP endpoints but the scheme metadata is not listed as 'ERDDAP:tabledap', making it hard to identify ERDDAP endpoints programmatically (see cell 13 of this notebook).

ocefpaf commented 6 years ago

Closing this due to inactivity. We worked-around most of the issues here by a more strict filtering with propertyname='apiso:Subject' rather than propertyname='apiso:AnyText', this does not mean the bad selection of cdip.ucsd is fixed! Those should not be there due to the bbox alone, without extra filtering.