Closed dnowacki-usgs closed 7 years ago
Thanks for reporting this. With longer timeseries requests, you likely hit our duration limit, and we are aware that some of our error messages do not come out as proper JSON. Without an example query, I can't be sure, but in any case, if you submitted a pull request to more gracefully catch erroneous JSON output, we would definitely consider integrating it.
Thanks,
Joe
Thanks for the reply. Here are two example queries showing the 2 year threshold.
This one works (1 Jan 2013 00:01–1 Jan 2015 00:00, i.e. minute less than two years of data)
from MesoPy import Meso
m = Meso(token='my token')
ts = m.timeseries(stid='kwal', start='201301010001', end='201501010000', units='METRIC')
This one fails (1 Jan 2013 00:00–1 Jan 2015 00:00, i.e. exactly two years)
from MesoPy import Meso
m = Meso(token='my token')
ts = m.timeseries(stid='kwal', start='201301010000', end='201501010000', units='METRIC')
and results in the following error:
ValueErrorTraceback (most recent call last)
<ipython-input-89-24027e4e9447> in <module>()
----> 1 ts = m.timeseries(stid='astm2', start='201301010000', end='201501010000', units='METRIC')
/Users/dnowacki/anaconda/lib/python2.7/site-packages/MesoPy.pyc in timeseries(self, start, end, **kwargs)
482 kwargs['token'] = self.token
483
--> 484 return self._get_response('stations/timeseries', kwargs)
485
486 def climatology(self, startclim, endclim, **kwargs):
/Users/dnowacki/anaconda/lib/python2.7/site-packages/MesoPy.pyc in _get_response(self, endpoint, request_dict)
160 except urllib.error.URLError:
161 raise MesoPyError(http_error)
--> 162 return self._checkresponse(json.loads(resp.decode('utf-8')))
163
164 def _check_geo_param(self, arg_list):
/Users/dnowacki/anaconda/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
337 parse_int is None and parse_float is None and
338 parse_constant is None and object_pairs_hook is None and not kw):
--> 339 return _default_decoder.decode(s)
340 if cls is None:
341 cls = JSONDecoder
/Users/dnowacki/anaconda/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
362
363 """
--> 364 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
365 end = _w(s, end).end()
366 if end != len(s):
/Users/dnowacki/anaconda/lib/python2.7/json/decoder.pyc in raw_decode(self, s, idx)
380 obj, end = self.scan_once(s, idx)
381 except StopIteration:
--> 382 raise ValueError("No JSON object could be decoded")
383 return obj, end
ValueError: No JSON object could be decoded
I have a working fix (an additional try:
, except ValueError:
in _get_response()
) that I'll make into a PR.
@dnowacki-usgs Any chance you put this into a PR (or branch) you can share?
I have been using a workaround but would be nice to get the fix!
@NicWayand I created a PR, it doesn't really fix the issue but it does catch the error. I wonder if the 2 year limit is a hard limit imposed by the MesoWest API.
Ah I see. Well the catch is appreciated! Guess that is what @joeyoun9 meant by their "duration limit"? I'll just stick with repeat calls then I guess.
Nic-
An even better option may be to bypass MesoPy and rely on the broader functionality now offered through the api services of synopticlabs. If there are key features in MesoPy that are not there, then let us know. We've been kicking around deprecating MesoPy as it has been pretty much overtaken by the api capabilities.
Regards
John
On Fri, Aug 18, 2017 at 1:42 PM, Nic Wayand notifications@github.com wrote:
Ah I see. Well the catch is appreciated! Guess that is what @joeyoun9 https://github.com/joeyoun9 meant by their "duration limit"? I'll just stick with repeat calls then I guess.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mesowx/MesoPy/issues/21#issuecomment-323444229, or mute the thread https://github.com/notifications/unsubscribe-auth/ACkz16XFhLtKYD3TCSD0QNIq888Nb771ks5sZek-gaJpZM4IvGx1 .
Hi John, thanks for the suggestion. Calling the API directly through my browser lets me grab multiple years. But, isn't this what MesoPy is doing in python, wrapping the API url calls with urllib
? I don't understand where the limit on number of years in MesoPy is coming from. I would like to continue to work in python (end goal is to download X stations over Y extent and convert to a netcdf file), but don't want to reinvent the wheel (MesoPy). Thanks!
Yep, I suspect the only thing MesoPy is really providing you is the urllib wrapper. I'll let one of the python gurus here comment further. I think when MesoPy was developed we did throttle it, as the api server at the time was constrained during development. Those limitations are not imposed by the api and you should be able to do what you want to do very efficiently.
Regards
john
On Fri, Aug 18, 2017 at 3:19 PM, Nic Wayand notifications@github.com wrote:
Hi John, thanks for the suggestion. Calling the API directly through my browser lets me grab multiple years. But, isn't this what MesoPy is doing in python, wrapping the API url calls with urllib? I don't understand where the limit on number of years in MesoPy is coming from. I would like to continue to work in python (end goal is to download X stations over Y extent and convert to a netcdf file), but don't want to reinvent the wheel (MesoPy).
My end goal in all of this is to download all stations with X variable over some Y extent and store as a netcdf file.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mesowx/MesoPy/issues/21#issuecomment-323463366, or mute the thread https://github.com/notifications/unsubscribe-auth/ACkz17wLDJd9qwN5n2HfMx1qwl0jc_Pcks5sZf_ngaJpZM4IvGx1 .
Ok thanks John. Here is a test case of the urlib working for one year but not multiple years. Note the multiple years works when pasted into my browser!
import requests
url_one = 'https://api.mesowest.net/v2/stations/timeseries?token=TOKENHERE&stid=kslc%20&start=201401010000&end=201506020000&vars=wind_speed'
url_all = 'https://api.mesowest.net/v2/stations/timeseries?token=TOKENHERE&stid=kslc%20&start=199701010000&end=201506020000&vars=wind_speed'
# This works
oneYear = requests.get(url_one).json()
print(oneYear.keys())
# This doesn't
allYears = requests.get(url_all).json()
# Spits out: JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Interesting, but here it is directly and I get all years in my browser
If you haven't done so already (you probably have), install jsonview in a chrome browser to see it all.
We've had some internal discussions to for sure improve the python/api linkages in the docs.
john
On Fri, Aug 18, 2017 at 3:55 PM, Nic Wayand notifications@github.com wrote:
Ok thanks John. Here is a test case of the urlib working for one year but not multiple years. Note the multiple years works when pasted into my browser!
import requests url_one = 'https://api.mesowest.net/v2/stations/timeseries?token=TOKENHERE&stid=kslc%20&start=201401010000&end=201506020000&vars=wind_speed' url_all = 'https://api.mesowest.net/v2/stations/timeseries?token=TOKENHERE&stid=kslc%20&start=199701010000&end=201506020000&vars=wind_speed'
This works
oneYear = requests.get(url_one).json() print(oneYear.keys())
This doesn't
allYears = requests.get(url_all).json()
Spits out: JSONDecodeError: Expecting value: line 1 column 1 (char 0)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mesowx/MesoPy/issues/21#issuecomment-323469554, or mute the thread https://github.com/notifications/unsubscribe-auth/ACkz1wwDIEEZDyW1LKSMAL0KNOeMDxMYks5sZghRgaJpZM4IvGx1 .
Sorry for getting in this late, I've been lounging around Martha's Vineyard for the last week taking in all the local seafood.
The issue here is not anyone's code or MesoPy, its that when the API returns a very large timeseries response the response gets converted to CSV to allow for applications to buffer and parse the response in chunks. By the nature of JSON its impossible to stream or chunk the data. Extremely large JSON payloads result in some serious overhead requirements to parse them.
This code example demonstrates the usual pattern I used to pull data from the Mesonet API and it has the parsing handling to catch for a CSV response. If you wanted to automate the CSV to whatever format you'd like to store the data in process I would add that code here:
try:
payload = json.loads(response.read())
except:
# Add CSV handler here.
print 'JSON decode error. More than likely a CSV response.'
return
or you could break apart the fetch and parsing process of this function into two separate functions to handle this case.
I hope this helps a little bit.
Hi @adamabernathy, sounds like the best option for large station/time downloads is to save to csv first, then load into xarray. Below is a simple bash script using curl with your API. This works fine for my purposes, although it would be nice to have the download and save option in mesopy (is this what you are suggesting?).
#!/bin/bash
# Bash Downloader for mesowest API
# file {stations.txt} must be in run dir, containing a list of station ids to download.
# Replace with your token
base_url='https://api.mesowest.net/v2/stations/timeseries?token=demotoken&stid='
data_folder='test'
var='wind_speed'
d_start='201701010000'
d_end='201708180000'
mkdir -p $data_folder
for i in $(cat stations.txt); do
curl $base_url$i"%20&start="d_start"&end="d_end"&vars="$var > $data_folder"/"$i".txt"
done
echo "Finished"
@NicWayand, the roll over to CSV is a special case and for the purpose of MesoPy should be considered an invalid response. What I was referring to earlier is if you were to get a CSV response back from the API, that an appropriate solution would be create a routine to break up the requests into smaller time segments and then append them to the NetCDF (or disk) in an iterative process.
From an engineering standpoint reading extremely large blocks of data into memory can be dangerous. If you wanted to persist this data to a NetCDF or HDF file, you can easily append the data, rather than loading the entire dataset into memory before the write process. This keeps the overall overhead pretty low.
For this ticket, I'm going to close it out commit fd1b006e58d67e866845ea31ee4e2e2bab9e9d09 fulfills the JSON loads failure.
It seems that the MesoWest API does not return valid JSON data for long timeseries requests (apparently those longer than 2 years). Currently MesoPy does not catch the JSON error and returns the somewhat opaque
ValueError: No JSON object could be decoded
. A more user-friendly error message that suggests shortening the time range would be nice. I'm happy to submit a pull request implementing this change if this sounds useful.