Closed rsignell-usgs closed 6 years ago
Here is an attempt to use an all catalog workflow to find those ERDDAP endpoints:
https://gist.github.com/ocefpaf/be4f1605894a84254120881c9c84d0dc
Here is a summary of the endpoints found for a 9 day "Boston Light Swim"-like search:
URL: https://data.ioos.us/csw
URL: https://gamone.whoi.edu/csw
In [25]:
dap_urls
Out[25]:
['http://oos.soest.hawaii.edu/thredds/dodsC/pacioos/hycom/global',
'http://thredds.secoora.org/thredds/dodsC/SECOORA_NCSU_CNAPS.nc',
'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/204p1_rt.nc',
'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/162p1_rt.nc',
'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/166p1_rt.nc',
'http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_FVCOM_OCEAN_BOSTON_FORECAST.nc',
'http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_FVCOM_OCEAN_SCITUATE_FORECAST.nc',
'http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc',
'http://geoport-dev.whoi.edu/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd',
'http://www.neracoos.org/thredds/dodsC/UMO/DSG/SOS/A01/Accelerometer/HistoricRealtime/Agg.ncml',
'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/179p1_rt.nc',
'http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_FVCOM_OCEAN_MASSBAY_FORECAST.nc',
'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/231p1_rt.nc',
'http://oos.soest.hawaii.edu/thredds/dodsC/hioos/satellite/dhw_5km',
'http://thredds.secoora.org/thredds/dodsC/G1_SST_GLOBAL.nc',
'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/201p1_rt.nc',
'http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/036p1_rt.nc']
In [26]:
sos_urls
Out[26]:
['http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/036p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/166p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/162p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/204p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/179p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/231p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities',
'http://www.neracoos.org/thredds/sos/UMO/DSG/SOS/A01/Accelerometer/HistoricRealtime/Agg.ncml?service=SOS&version=1.0.0&request=GetCapabilities',
'http://thredds.cdip.ucsd.edu/thredds/sos/cdip/realtime/201p1_rt.nc?service=SOS&version=1.0.0&request=GetCapabilities']
In [24]:
erddap_urls
Out[24]:
[]
We should find NDBC 44013
and NOS.CO-OPS 8443970
there, but all we got is NDBC 44029
(http://www.neracoos.org/thredds/sos/UMO/DSG/SOS/A01/Accelerometer/HistoricRealtime/Agg.ncml).
And many cdip.ucsd
moorings too :-/
There is no temperature data in the endpoint for 44029 that we got from the catalog, but the buoy does record temperature data, see http://www.ndbc.noaa.gov/data/realtime2/44029.ocean
Also, something is not OK with the search b/c that buoy should be outside of the search bounding box. See cells [12]-[15] of http://nbviewer.jupyter.org/gist/ocefpaf/be4f1605894a84254120881c9c84d0dc
@ocefpaf , so there are 4 or 5 issues here, right? @jbosch-noaa, who is our contact on catalog issues now that Luke has moved on?
@ocefpaf , so there are 4 or 5 issues here, right?
Yeah, I should've broken down into smaller parts to make this issue more useful :grimacing:
Here are the issue with the https://data.ioos.us/csw catalog we found in this search:
1) cdip.ucsd
stations shouldn't be found xref: https://github.com/SECOORA/skill_score/issues/197;
2) missing buoy NDBC 44013
;
3) found NDBC 44029
but that is actually out of the search bounding box;
4) the OPeNDAP endpoint for NDBC 44029
does not have the temperate data;
5) could not find any ERDDAP
endpoint, but I am pretty sure there is at least one.
6) cannot find NOS.CO-OPS 8443970
station even from the time when the temperature sensor was active (before Jun 17 2017 4:40PM);
Pinging @mwengren and @jbosch-noaa here for help.
The notebook, http://nbviewer.jupyter.org/gist/ocefpaf/be4f1605894a84254120881c9c84d0dc, connects all the issues above with actual code.
Right now the notebook is useless b/c https://data.ioos.us/csw seems to be down (ping @mwengren), but I will re-run it as soon as the catalog is back online.
@ocefpaf The CS-W service looks to be working now. Not sure if there was an outage earlier or why.
@ericmbernier and @benjwadams are the ASA contacts for Catalog issues and support going forward, and I'll still be managing it for IOOS.
At the moment, we're in a development freeze though until new funding works its way through the system.
But, we should be able to troubleshoot critical issues if they are clearly identified. I haven't looked at all the notebook links above, but first thing ASA might be able to confirm is if the CKAN to pycsw database sync process is running properly, if it seems the content isn't current. If not this, then we might have to get into finer grained content issues in the CS-W service if you're saying it's not correct.
Thanks for the info @mwengren! I updated the notebook above now that the catalog is back online.
Just tried this one again, see the notebook here, and I have good news and bad news...
The good news are that the NOAA.NOS.CO-OPS
stations are showing up again when looking for SOS :tada:
The bad news are that the extraneous cdip.uscd
are not gone and a few new ones from edu_fau
and edu_usf
are now showing up, see cell [9]. Also, still no ERDDAP endpoints are found when using identifier='ERDDAP:tabledap'
. (Cell [7])
It is quite awkward to filter a "valid" results from the catalog search. Note that this issue has been plaguing us for a few years now: https://github.com/SECOORA/skill_score/issues/197
Not sure if my filtering is completely off or if these issues need to solved at the data provider metadata level.
@jbosch-noaa we still do not get any ERDDAP endpoints correctly when filtering with the identifier 'ERDDAP:tabledap'
. Here is a notebook demonstrating this:
@rsignell-usgs any suggestion or should we abandon the idea of looking for ERDDAP endpoints with the catalog?
@ocefpaf - In talking to @mwengren , we postulated that SECOORA may not have their ERDDAP server in the catalog because they have over 2300 data sets!
Perhaps there is another RA that has model data we can use as an example instead of SECOORA? It doesn't mean we give up on this notebook. You can just hard code the path and find a different model to work with for another notebook.
Another option is to talk to @kwilcox about manually moving some ERDDAP files over to their WAF so they get registered in the catalog.
Thoughts?
@ocefpaf There are many ERDDAP services registered in the Catalog, but none published by SECOORA per Jen's comment. See: https://data.ioos.us/dataset?res_format=ERDDAP-TableDAP. There may be some disconnect between how pycsw is filtering the 'ERDDAP-TableDAP' endpoints compared to how CKAN represents them.
We do attempt to identify and tag them at the CKAN level as in the above. I am not sure what pycsw is doing wrt ERDDAP.
@ocefpaf There are many ERDDAP services registered in the Catalog, but none published by SECOORA per Jen's comment.
The search in the example is for the Boston area, so SECOORA data availability is not an issue there. However, b/c we are planning more model-observation comparison notebooks, it is nice to know that SECOORA ERDDAP data won't show up.
See: https://data.ioos.us/dataset?res_format=ERDDAP-TableDAP. There may be some disconnect between how pycsw is filtering the 'ERDDAP-TableDAP' endpoints compared to how CKAN represents them.
Not sure what are the implications of that, sorry I am not an expert on pycsw and CKAN.
We do attempt to identify and tag them at the CKAN level as in the above. I am not sure what pycsw is doing wrt ERDDAP.
I'll try to do a "reverse" search example using https://data.ioos.us/dataset?res_format=ERDDAP-TableDAP so we can have control of the endpoints that should show up. More on this soon...
Is there a way to register an ERDDAP server with the catalog?
Yes, you can register the ERDDAP WAF directly in the Harvest Registry (there is an 'ERDDAP WAF' type as it differs from a plain old HTTP WAF. I assumed you were holding off on registering the SECOORA ERDDAP due to the proliferation of 'datasets' in there. But, register away, the more the merrier I suppose.
@ocefpaf I guess my point is that there are many ERDDAP services in the Catalog, they just don't seem to be obtainable via pycsw, I guess. We'll have to investigate more.
they just don't seem to be obtainable via pycsw
Sorry. I am confused. Do you mean OWSLib
's CatalogueServiceWeb
class I used in the example to obtain the endpoints? pycsw
is a CSW server implementation, right?
Note that I did find some ERDDAP endpoints but the scheme metadata is not listed as 'ERDDAP:tabledap'
, making it hard to identify ERDDAP endpoints programmatically (see cell 13 of this notebook).
Closing this due to inactivity. We worked-around most of the issues here by a more strict filtering with propertyname='apiso:Subject'
rather than propertyname='apiso:AnyText'
, this does not mean the bad selection of cdip.ucsd
is fixed! Those should not be there due to the bbox
alone, without extra filtering.
At some point we should try finding and reading data for the light swim notebook using the new CO-OPS ERDDAP endpoint.
Some pydap examples of reading the CO-OPS endpoint are here: https://github.com/dnowacki-usgs/notebooks/blob/master/noaa_coops_erddap.ipynb