ioos / colocate

Co-locate oceanographic data by establishing constraints
MIT License
5 stars 9 forks source link

Return a column containing cdm_data_type for each dataset_id in the .… #19

Closed justjacqueline closed 3 years ago

justjacqueline commented 3 years ago

Return a column containing cdm_data_type for each dataset_id in the .get_coordinates() function.

mwengren commented 3 years ago

Looks good to me, thanks!

MathewBiddle commented 3 years ago

I don't think this has to do with the changes proposed here. But, I was reviewing the pull request when I came across this.

Something odd is happening in the get_coordinates function. When I run it as follows, I don't get any data back. However, I should.

In[6]: from colocate import erddap_query
  ...: # identify single server you want to ping
  ...: url = 'https://erddap.bco-dmo.org/erddap/' 
  ...: # set up the keywords
  ...: kw = {
  ...:       'min_lon': -123.628173,
  ...:       'max_lon': -122.02382599999999,
  ...:       'min_lat': 47.25972200000001,
  ...:       'max_lat': 48.32253399999999,
  ...:       'min_time': '2012-01-27T00:00:00Z',
  ...:       'max_time': '2019-12-31T00:00:00Z',
  ...:       }
  ...: # search for datasets
  ...: df = erddap_query.query(url,**kw)
  ...: # get coordinates
  ...: df_coords = erddap_query.get_coordinates(df, **kw)
Testing ERDDAP https://erddap.bco-dmo.org/erddap
ERDDAP https://erddap.bco-dmo.org/erddap returned results from URL: https://erddap.bco-dmo.org/erddap/search/advanced.csv?page=1&itemsPerPage=1000&protocol=tabledap&cdm_data_type=(ANY)&institution=(ANY)&ioos_category=(ANY)&keywords=(ANY)&long_name=(ANY)&standard_name=(ANY)&variableName=(ANY)&minLon=-123.628173&maxLon=-122.02382599999999&minLat=47.25972200000001&maxLat=48.32253399999999&minTime=1327622400.0&maxTime=1577750400.0
index_random: [2, 4, 1, 3, 0]
datasets_found: 0
Download URL: https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_743054.csvp?latitude,longitude&time>=1327622400.0&time<=1577750400.0&longitude>=-123.628173&longitude<=-122.02382599999999&latitude>=47.25972200000001&latitude<=48.32253399999999&distinct()
HTTP Error 400: 
datasets_found: 0
Download URL: https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_682074.csvp?latitude,longitude&time>=1327622400.0&time<=1577750400.0&longitude>=-123.628173&longitude<=-122.02382599999999&latitude>=47.25972200000001&latitude<=48.32253399999999&distinct()
HTTP Error 400: 
datasets_found: 0
Download URL: https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_743320.csvp?latitude,longitude&time>=1327622400.0&time<=1577750400.0&longitude>=-123.628173&longitude<=-122.02382599999999&latitude>=47.25972200000001&latitude<=48.32253399999999&distinct()
HTTP Error 400: 
datasets_found: 0
Download URL: https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_743274.csvp?latitude,longitude&time>=1327622400.0&time<=1577750400.0&longitude>=-123.628173&longitude<=-122.02382599999999&latitude>=47.25972200000001&latitude<=48.32253399999999&distinct()
HTTP Error 400: 
datasets_found: 0
Download URL: https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_743224.csvp?latitude,longitude&time>=1327622400.0&time<=1577750400.0&longitude>=-123.628173&longitude<=-122.02382599999999&latitude>=47.25972200000001&latitude<=48.32253399999999&distinct()
HTTP Error 400: 

Picking out one of those datasets, and looking at[ one of the htmlTable responses](https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_682074.htmlTable?latitude,longitude&time%3E=1327622400.0&time%3C=1577750400.0&longitude%3E=-123.628173&longitude%3C=-122.02382599999999&latitude%3E=47.25972200000001&latitude%3C=48.32253399999999&distinct()) verbatim as it's printed to screen, we do get valid data back. Not sure why I'm getting 400 errors.

Any thoughts?

mwengren commented 3 years ago

Yes, those are just what ERDDAP responds with when a query doesn't return any results. 404s can happen for other reasons of course, but in this case I'm pretty sure that's just a server saying 'no data matches the query parameters received'. Different codes might be returned depending on the ERDDAP version, so sometimes there are 5XX errors, sometimes 4XX.

It happens when the first bounding box query matches in query(), but there aren't actually any points within the bbox (sparse datasets, primarily).

We should probably silently ignore them, but for now in the erddap_query.py code, we're still just printing them to stdout. Basically the loop in get_coordinates() just moves on to the next query and tries the same request, until it reaches the limit of 10 datasets with coordinates, defined here.

I'll go ahead an merge this PR so we can move on and add other things.

After looking at it some, I think we may eventually want to create a separate function that hits the ERDDAP 'info' response and stores all the dataset metadata results, rather than the one field but for now this is a good start. We can use this code for that later on.