Closed emiliom closed 4 years ago
I also just found out via trial and error that the earliest date allowable for iv
s is 1900-01-01, and for dv
s is 1600-01-01. NWIS yells at you if it's earlier than either of those. Just so you know!
Final note: NWIS itself has a bad equality somewhere. The start date actually needs to be 1900-01-02 at the earliest for iv
. Checked, and 1600-01-01 is fine for dv
.
If I recall correctly, back when ulmo was started the IV data only went back to 2007 and request failed if you asked for earlier dates.
If I recall correctly, back when ulmo was started the IV data only went back to 2007 and request failed if you asked for earlier dates.
Thanks. I too have a fuzzy recollection that IV data availability was more limited back then.
I also just found out via trial and error that the earliest date allowable for iv is 1900-01-01, and for dv is 1600-01-01. NWIS yells at you if it's earlier than either of those. Just so you know!
Final note: NWIS itself has a bad equality somewhere. The start date actually needs to be 1900-01-02 at the earliest for iv. Checked, and 1600-01-01 is fine for dv.
Thanks, @erekalper. Great to know.
I think we have our answers. The easiest approach for a fix will be to change the start date for IV to 1900-01-01. I seriously doubt there's any data prior to 1850 in NWIS, but we can change the DV start date to 1600-01-01. I can also ping an NWIS USGS contact about this; I was just on the phone with one an hour ago.
Just to be clear, the iv service will error out with 1900-01-01; it oddly needs to be 1900-01-02.
Just to be clear, the iv service will error out with 1900-01-01; it oddly needs to be 1900-01-02.
Hi @erekalper, I tested this out and I'm getting valid responses with a start date of 1900-01-01.. I tried a handful of gauges. Here's an example:
In [30]: data = ulmo.usgs.nwis.get_site_data('08031290', service='iv', start='1900-01-01')
processing data from request: https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00
In [31]: data[list(data.keys())[0]]['values'][:2]
Out[31]:
[{'value': '439.96',
'qualifiers': 'A',
'datetime': '2007-10-01T01:00:00-05:00'},
{'value': '439.96',
'qualifiers': 'A',
'datetime': '2007-10-01T01:15:00-05:00'}]
Would be interesting to know if you hit a corner case with the service.
It looks like I did! I wasn't testing a case without an end date or without a full timestamp. In light of that, I tried all edge cases that I could think of below:
:heavy_check_mark: No end date, start has no timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01 :heavy_check_mark: No end date, start has timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00 :heavy_check_mark: End date, start has no timestamp, end has no timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01&endDT=2019-01-01 :heavy_check_mark: End date, start has no timestamp, end has timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01&endDT=2019-01-01T00%3A00%3A00 :x: End date, start has timestamp, end has no timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00&endDT=2019-01-01 :x: End date, start has timestamp, end has timestamp https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T00%3A00%3A00&endDT=2019-01-01T00%3A00%3A00
It looks like the call fails whenever there's an end date, and the start date also has a timestamp. I tested this in a Jupyter Notebook as well, and those two calls returned an empty dictionary.
From those last two, the website returns the following:
I get the same error through a starting timestamp of 1900-01-01T04:59:59: https://nwis.waterservices.usgs.gov/nwis/iv/?format=waterml&site=08031290&startDT=1900-01-01T04%3A59%3A59&endDT=2019-01-01T00%3A00%3A00 Any time after that, starting at 1900-01-01T05:00:00, I get an actual return. I wonder if it's somehow assuming a local time vs. UCT? I'm on EST, which is five behind UCT, which is really the only thing I could think of for why we'd see that difference. Is it the same for others here in different local timezones?
Thanks, @erekalper and @solomon-negusse ! Nice sleuthing.
I don't have anything to add, except to bring in @jkreft-usgs to see if we can interest him in chiming in about these NWIS web service issues and start-datetime limits. Jim or someone on his NWIS team are the ones who can provide definitive answers.
Any time after that, starting at 1900-01-01T05:00:00, I get an actual return. I wonder if it's somehow assuming a local time vs. UCT? I'm on EST, which is five behind UCT, which is really the only thing I could think of for why we'd see that difference. Is it the same for others here in different local timezones?
I'm on CST (UTC - 6 hrs) time zone and getting valid response with 1900-01-01T05:00:00.. I'd have expected it to fail up to 1900-01-01T05:59:00 if it was localizing.
Time is hard! There are differences between the different services, as well as extremely confusing time zone rules. I think that the earliest instantaneous data we have goes back to the 30s or so, so choosing a date somewhere in the 1910s will be fine. 2007 used to be a hard cut-off, but there was an effort some number of years ago to back-load data from an offline archive so that it could be available via public web services. Worth noting that we are also planning on building and rolling out new services over the course of the coming months and years that should be much more reasonable, use UTC by default, etc.
Thanks @jkreft-usgs ! I think we have everything we need to update the ulmo nwis reader.
@erekalper pointed out this inappropriate (I think) hard-wired behavior for
nwis.get_site_data
.The
period
argument accepts a value of 'all'. When that's used, hard-wired start dates are used, as defined here. Forservice='iv'
, the hard-wired start date is set todatetime.datetime(2007, 10, 1)
. I don't know where this start date came from, but it's misleading and can lead to wrong results. No data prior to 2007-10-1 would be returned without the user being aware of that imposed cutoff. In the case of a time series that ends before 2007-10-1, no data at all are returned. For example:returns no data because this is a time series that runs from 1993 to 2006-09-30 23:45. The data can still be obtained by passing an appropriate start date instead of using period='all', eg:
It looks like the start date for 'iv' service should be changed to a much older date, probably the same as the one used for 'dv', to be safe (1851-1-1). Or we should follow up with someone from USGS to learn more about this.