ulmo-dev / ulmo

clean, simple and fast access to public hydrology and climatology data.
http://ulmo.readthedocs.org
Other
169 stars 63 forks source link

nwis get_site_data with service='iv' and period='all' sets an inappropriately recent start date #175

Closed emiliom closed 4 years ago

emiliom commented 4 years ago

@erekalper pointed out this inappropriate (I think) hard-wired behavior for nwis.get_site_data.

The period argument accepts a value of 'all'. When that's used, hard-wired start dates are used, as defined here. For service='iv', the hard-wired start date is set to datetime.datetime(2007, 10, 1). I don't know where this start date came from, but it's misleading and can lead to wrong results. No data prior to 2007-10-1 would be returned without the user being aware of that imposed cutoff. In the case of a time series that ends before 2007-10-1, no data at all are returned. For example:

data = ulmo.usgs.nwis.get_site_data('09111500', service='iv', period='all')

returns no data because this is a time series that runs from 1993 to 2006-09-30 23:45. The data can still be obtained by passing an appropriate start date instead of using period='all', eg:

data = ulmo.usgs.nwis.get_site_data('09111500', service='iv', start='1993-01-01')

It looks like the start date for 'iv' service should be changed to a much older date, probably the same as the one used for 'dv', to be safe (1851-1-1). Or we should follow up with someone from USGS to learn more about this.

erekalper commented 4 years ago

I also just found out via trial and error that the earliest date allowable for ivs is 1900-01-01, and for dvs is 1600-01-01. NWIS yells at you if it's earlier than either of those. Just so you know!

erekalper commented 4 years ago

Final note: NWIS itself has a bad equality somewhere. The start date actually needs to be 1900-01-02 at the earliest for iv. Checked, and 1600-01-01 is fine for dv.

dharhas commented 4 years ago

If I recall correctly, back when ulmo was started the IV data only went back to 2007 and request failed if you asked for earlier dates.

emiliom commented 4 years ago

If I recall correctly, back when ulmo was started the IV data only went back to 2007 and request failed if you asked for earlier dates.

Thanks. I too have a fuzzy recollection that IV data availability was more limited back then.

I also just found out via trial and error that the earliest date allowable for iv is 1900-01-01, and for dv is 1600-01-01. NWIS yells at you if it's earlier than either of those. Just so you know!

Final note: NWIS itself has a bad equality somewhere. The start date actually needs to be 1900-01-02 at the earliest for iv. Checked, and 1600-01-01 is fine for dv.

Thanks, @erekalper. Great to know.

I think we have our answers. The easiest approach for a fix will be to change the start date for IV to 1900-01-01. I seriously doubt there's any data prior to 1850 in NWIS, but we can change the DV start date to 1600-01-01. I can also ping an NWIS USGS contact about this; I was just on the phone with one an hour ago.

erekalper commented 4 years ago

Just to be clear, the iv service will error out with 1900-01-01; it oddly needs to be 1900-01-02.

solomon-negusse commented 4 years ago

Just to be clear, the iv service will error out with 1900-01-01; it oddly needs to be 1900-01-02.

Hi @erekalper, I tested this out and I'm getting valid responses with a start date of 1900-01-01.. I tried a handful of gauges. Here's an example:

In [30]: data = ulmo.usgs.nwis.get_site_data('08031290', service='iv', start='1900-01-01')              
processing data from request: https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00

In [31]: data[list(data.keys())[0]]['values'][:2]                                                       
Out[31]: 
[{'value': '439.96',
  'qualifiers': 'A',
  'datetime': '2007-10-01T01:00:00-05:00'},
 {'value': '439.96',
  'qualifiers': 'A',
  'datetime': '2007-10-01T01:15:00-05:00'}]

Would be interesting to know if you hit a corner case with the service.

erekalper commented 4 years ago

It looks like I did! I wasn't testing a case without an end date or without a full timestamp. In light of that, I tried all edge cases that I could think of below:

:heavy_check_mark: No end date, start has no timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01 :heavy_check_mark: No end date, start has timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00 :heavy_check_mark: End date, start has no timestamp, end has no timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01&endDT=2019-01-01 :heavy_check_mark: End date, start has no timestamp, end has timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01&endDT=2019-01-01T00%3A00%3A00 :x: End date, start has timestamp, end has no timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00&endDT=2019-01-01 :x: End date, start has timestamp, end has timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00&endDT=2019-01-01T00%3A00%3A00

It looks like the call fails whenever there's an end date, and the start date also has a timestamp. I tested this in a Jupyter Notebook as well, and those two calls returned an empty dictionary.

erekalper commented 4 years ago

From those last two, the website returns the following: image

I get the same error through a starting timestamp of 1900-01-01T04:59:59: https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T04%3A59%3A59&endDT=2019-01-01T00%3A00%3A00 Any time after that, starting at 1900-01-01T05:00:00, I get an actual return. I wonder if it's somehow assuming a local time vs. UCT? I'm on EST, which is five behind UCT, which is really the only thing I could think of for why we'd see that difference. Is it the same for others here in different local timezones?

emiliom commented 4 years ago

Thanks, @erekalper and @solomon-negusse ! Nice sleuthing.

I don't have anything to add, except to bring in @jkreft-usgs to see if we can interest him in chiming in about these NWIS web service issues and start-datetime limits. Jim or someone on his NWIS team are the ones who can provide definitive answers.

solomon-negusse commented 4 years ago

Any time after that, starting at 1900-01-01T05:00:00, I get an actual return. I wonder if it's somehow assuming a local time vs. UCT? I'm on EST, which is five behind UCT, which is really the only thing I could think of for why we'd see that difference. Is it the same for others here in different local timezones?

I'm on CST (UTC - 6 hrs) time zone and getting valid response with 1900-01-01T05:00:00.. I'd have expected it to fail up to 1900-01-01T05:59:00 if it was localizing.

jkreft-usgs commented 4 years ago

Time is hard! There are differences between the different services, as well as extremely confusing time zone rules. I think that the earliest instantaneous data we have goes back to the 30s or so, so choosing a date somewhere in the 1910s will be fine. 2007 used to be a hard cut-off, but there was an effort some number of years ago to back-load data from an offline archive so that it could be available via public web services. Worth noting that we are also planning on building and rolling out new services over the course of the coming months and years that should be much more reasonable, use UTC by default, etc.

emiliom commented 4 years ago

Thanks @jkreft-usgs ! I think we have everything we need to update the ulmo nwis reader.