ulmo-dev / ulmo

clean, simple and fast access to public hydrology and climatology data.
http://ulmo.readthedocs.org
Other
169 stars 63 forks source link

CUAHSI SNOTEL response timezone and start-end request times #171

Closed emiliom closed 4 years ago

emiliom commented 5 years ago

Reported by @erekalper in email to @emiliom:

ulmo can access SNOTEL data. (I wasn't immediately aware of this, actually, but discovered it though your tutorial here: https://github.com/uwescience/Python-for-geosciences/tree/master/20170307.). Note from Emilio: This access is via the CUAHSI HIS service

I found an up-to-date WSDL URL and got going, only to be a bit confused by the timezone in the returns. That is, there is none. Is it meant to be UTC? I see in your example that it looks like you're assuming that's the case, but I wanted to make absolutely sure it's that and not local time from the site in question. (I couldn't find information in SNOTEL's documentation about what it actually is, though it's entirely possible I missed that). I don't want to be a bother, but I only ask because you've got some experience here.

I previously emailed someone at USDA supposedly in charge of some of the SNOTEL stuff (Deb Harms - I found her contact info buried in some of the metadata returned from a get site info call), though I've not heard back from her yet. I figured I'd let you know, though, in case she does reply at some point.

@dshean Since you worked recently on direct SNOTEL access (after testing on ulmo), can you help us figure out the timezone for SNOTEL data response, from the data service you're using? Having that info, we can compare it against sample SNOTEL responses from the ulmo CUAHSI HIS reader, to figure out -- and document -- what's being returned there. Thanks!

erekalper commented 5 years ago

The USDA contact got back to me, though I'm honestly still a bit unclear after a back and forth with her.

The gist of what she said follows:

I had already looked at reports, and saw that the timezone seems to be uniformly reported as PST. However, I attributed this to whatever server was generating the report, and not necessarily the timezone of the data itself. This gets more unclear, though, if you pull up a report from Alaska: https://wcc.sc.egov.usda.gov/reportGenerator/view/customSingleStationReport/hourly/1191:AK:SNTL|id%3D%22%22|name/-167%2C0/WTEQ%3A%3Avalue%2CSNWD%3A%3Avalue%2CPREC%3A%3Avalue%2CTOBS%3A%3Avalue?fitToScreen=false

The timezone is still PST, which goes against the second thing she told me.

I'm ready to assume all SNOTEL timezones are reported in PST, but I'll leave it to those with more knowledge in this area to make a final call.

Thanks for your time and help with this!

emiliom commented 5 years ago

Thanks for reporting back, @erekalper. I don't use SNOTEL myself. I'll wait for @dshean (or anyone else in the know) to chime in before closing this issue and updating the ulmo docs and SNOTEL docstrings.

dshean commented 5 years ago

Hi all. I'm afraid I don't have time to look into this right now due to other commitments and looming deadlines. @jmichellehu maybe you can look into this, based on foundation in the GDA SNOTEL notebook? I also had some communications with CUAHSI devs about the hourly vs daily SNOTEL queries. Need to follow up on those.

emiliom commented 5 years ago

Understood, @dshean. Thanks for letting me know. @jmichellehu, your help would be great! I'm not asking about ulmo specifically. I know @dshean ended up using the data service (or file download?) directly from SNOTEL. If I can see an example from that, or you can point us to the corresponding documentation, I could compare to the ulmo results for the same station.

jmichellehu commented 5 years ago

Hey all, sorry for the delay. David wrote a function using ulmo to directly fetch data from SNOTEL -- no file download necessary. I've included the function below, but note that a recent run of it did throw an exception, so not sure how things are (or aren't) working.

#Get current datetime
today = datetime.today().strftime('%Y-%m-%d')

def fetch(sitecode, variablecode='SNOTEL:SNWD_D', start_date='1950-10-01', end_date=today):
    print(sitecode, variablecode, start_date, end_date)
    values_df = None
    try:
        #Request data from the server
        site_values = ulmo.cuahsi.wof.get_values(wsdlurl, sitecode, variablecode, start=start_date, end=end_date)
        #Convert to a Pandas DataFrame   
        values_df = pd.DataFrame.from_dict(site_values['values'])
        #Parse the datetime values to Pandas Timestamp objects
        values_df['datetime'] = pd.to_datetime(values_df['datetime'], utc=True)
        #Set the DataFrame index to the Timestamps
        values_df = values_df.set_index('datetime')
        #Convert values to float and replace -9999 nodata values with NaN
        values_df['value'] = pd.to_numeric(values_df['value']).replace(-9999, np.nan)
        #Remove any records flagged with lower quality
        values_df = values_df[values_df['quality_control_level_code'] == '1']
    except:
        print("Unable to fetch %s" % variablecode)

    return values_df

A quick inspection of hourly data from the online report generator for sites in OR (651), CO (1100) and AK (1174, 1175) does seem to show that all data (even those in AK) are reported in PST. (Retrieved 4:48PM GMT -8, all sites outside of AK reporting up to 16:00 PST, AK has data up to 15:00 PST).

So my intuition is that all SNOTEL sites report in "real-time" PST. It seems like sites in later time zones, like Colorado, will report to current PST despite possessing "future" data And sites in earlier time zones, like Alaska, will also report to current PST despite not having "real-time" data.

emiliom commented 5 years ago

Thanks so much, @jmichellehu!

emiliom commented 5 years ago

From @erekalper:

I've got one more, this time regarding passing along a start and stop time for SNOTEL data. I've got the following code:

wsdl_url = 'http://hydroportal.cuahsi.org/Snotel/cuahsi_1_1.asmx?WSDL' site_code = 'SNOTEL:380_CO_SNTL' variable_code = 'SNOTEL:SNWD_H' site_values = ulmo.cuahsi.wof.get_values(wsdl_url, site_code, variable_code)

In the ulmo docs, it says that if you want the full possible range of data returned, simply omit the "start" and "end" parameters in the get_values call. However, I'm getting the following error when I do so:

WebFault: Server raised fault: 'System.Web.Services.Protocols.SoapException: String reference not set to an instance of a String. Parameter name: s at WaterOneFlow.odws.v1_1.Service.GetValuesObject(String location, String variable, String startDate, String endDate, String authToken) in c:\inetpub\wwwroot\Snotel\App_Code\Service_1_1.cs:line 193 at WaterOneFlow.odws.v1_1.Service.GetValues(String location, String variable, String startDate, String endDate, String authToken) in c:\inetpub\wwwroot\Snotel\App_Code\Service_1_1.cs:line 176'

To "bypass" this error, I started toying with comically early timestamps, like 01/01/1500. For those, it told me that such dates are out of bounds:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1605-01-01 00:00:00

However, by the time I got to 01/01/1700, I instead got the following:

WebFault: Server raised fault: 'System.Web.Services.Protocols.SoapException: Failed to get hourly data; nested exception is: java.sql.SQLException: Only dates between January 1, 1753 and December 31, 9999 are accepted. at WaterOneFlow.odws.v1_1.Service.GetValuesObject(String location, String variable, String startDate, String endDate, String authToken) in c:\inetpub\wwwroot\Snotel\App_Code\Service_1_1.cs:line 193 at WaterOneFlow.odws.v1_1.Service.GetValues(String location, String variable, String startDate, String endDate, String authToken) in c:\inetpub\wwwroot\Snotel\App_Code\Service_1_1.cs:line 176'

So through some trial and error, it looks like 01/01/1753 is the earliest date you can put in for "start" and have it still work. So for now I'm using that, and datetime.datetime.now() as the "stop" value. This simulates the same result. But still, I'm wondering if you know why this error is cropping up. Thanks!

emiliom commented 5 years ago

@erekalper The documentation needs to be updated. CUAHSI changed its web service a couple of years ago, to disallow empty start or end parameters.

What you found about the earliest start time is interesting! We may be able to use it to hardwire a start time if blank is passed, for a CUAHSI request.

I'll update the documentation when I get around to updating the documentation about time zone.

erekalper commented 5 years ago

Glad I could help a bit! Similarly, it looks like you could also hard code "12/31/9999" as an end date if a blank is passed for the same result.

emiliom commented 4 years ago

@erekalper and @jmichellehu I'm writing up ulmo documentation for requesting SNOTEL data via the ulmo CUAHSI WaterOneFlow services. I have a question. It's clear from your findings that the data returned are in PST. Got it. I assume that means that the start and end timestamps specified in ulmo.cuahsi.wof.get_values also must be in PST. In your usage, have you noticed if that's the case?

Thanks!

jmichellehu commented 4 years ago

@emiliom that would be the logic I would follow. I can't say that I've noticed in my use, but if the reporting is in PST, I can't imagine the requests would transform into a different time zone - they're just asking for whatever date and time values exist - which seem to all be PST.

emiliom commented 4 years ago

Thanks for chiming in @jmichellehu ! I've updated the documentation to reflect that understanding. That'll be in a new ulmo release I hope to put out by next week.

dharhas commented 4 years ago

@emiliom something to check is whether Pandas is being used anywhere. Its been a long time and I'm reasonably sure that the wof module stuff just returns dicts but one thing I do remember encountering a longtime ago is that Pandas used to implicitly cast timestamps to UTC if a timezone was present in the timestamp string. I'm not sure what current behavior is in Pandas.