Closed kovar-ursa closed 2 years ago
Thanks for the suggestion!
Regarding the elevation, I would highly recommend using Py3DEP as it's trivial to get elevations of thousands of points via either its CLI or using it as a library. Two neat features of all HyRiver packages are asynchronous requests and their usage of persistent caching. With persistent caching all requests/responses are stored in an SQL database so repeated requests, under-the-hood, are read from the database.
About your NLCD suggestion, the main idea behind HyRiver is to avoid using local copies of databases by relying on features such as server-side operations and persistent caching. I'll work on a suitable solution.
If we can get everything we need in one place, with a minimum of server load, that'd be superb. We need elevation and land use at the moment. I suspect that, once this works, we can find value in other aspects of the data set. PyDaymet
may be useful, for example.
I implemented two new functions one of which is for getting data based on coordinates. Take a look at this example notebook. You can give it a try by installing the dependencies from GitHub until I release an official new version. You can do so using pip
like this:
pip install git+https://github.com/cheginit/async_retriever.git git+https://github.com/cheginit/pygeoogc.git git+https://github.com/cheginit/pygeoutils.git git+https://github.com/cheginit/pynhd.git git+https://github.com/cheginit/pygeohydro.git
Looks good!
import osmnx as ox
import pygeohydro as gh
from pynhd import NLDI
# geometry = NLDI().get_basins(["01031450", "01031500", "01031510"])
lulc = gh.nlcd_bygeom(geometry, 100, years={"cover": [2016, 2019]})
coords = [(42.69513, -71.030437),
(42.694901, -71.027653),
(42.695388, -71.026931),
(42.695383, -71.026942),
(42.696471, -71.023837),
(42.697136, -71.023545),
(42.699233, -71.024387),
(42.698356, -71.021488),
(42.696643, -71.023499),
(42.694305, -71.030054),
(42.693343, -71.03474),
(42.693349, -71.034757),
(42.694002, -71.035491),
(42.693452, -71.033743)]
x, y = zip(*coords)
lulc = gh.nlcd_bycoords(list(zip(y, x)), years={"cover": [2019], "canopy": [2016]})
lulc
Produces:
0 POINT (-71.03044 42.69513) 83.0 43
1 POINT (-71.02765 42.69490) 83.0 90
2 POINT (-71.02693 42.69539) 61.0 90
3 POINT (-71.02694 42.69538) 61.0 90
4 POINT (-71.02384 42.69647) 65.0 90
5 POINT (-71.02354 42.69714) 66.0 90
6 POINT (-71.02439 42.69923) 70.0 90
7 POINT (-71.02149 42.69836) 65.0 90
8 POINT (-71.02350 42.69664) 64.0 90
9 POINT (-71.03005 42.69430) 67.0 41
10 POINT (-71.03474 42.69334) 86.0 43
11 POINT (-71.03476 42.69335) 86.0 43
12 POINT (-71.03549 42.69400) 79.0 43
13 POINT (-71.03374 42.69345) 52.0 21
Two tangential questions, if I may:
1) Since I am using nlcd_bycoords
, I do not need to load the watershed geometries, correct?
2) I need string values for the cover. Is this the appropriate mapping from number to text? - https://www.mrlc.gov/data/legends/national-land-cover-database-2019-nlcd2019-legend
Thank you -very- much, this is a huge help.
gh.helpers.nlcd_helper()["classes"]
to get classes and their description.Sure!
I ran this again against a much larger dataset and did so with Jupyter. It produced a very large number of INFO warnings that mentioned 'chaos'. I unfortunately didn't preserve the error messages and cannot recreate them.
I've integrated the code into our app and it does what it should. Thanks again!
Is there a way to increase the timeout period?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 447, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.mrlc.gov', port=443): Read timed out. (read timeout=30)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/kovar/work/a50_utils/./db_utils.py", line 497, in <module>
main()
File "/Users/kovar/work/a50_utils/./db_utils.py", line 490, in main
upload_as(engine, args.file, args.tz, args.wxdir, args.a50)
File "/Users/kovar/work/a50_utils/./db_utils.py", line 218, in upload_as
do_objects(s, engine, df, 'Home', wxclear)
File "/Users/kovar/work/a50_utils/./db_utils.py", line 278, in do_objects
do_tracks(s, engine, df, object_type, objectID[0], wxclear)
File "/Users/kovar/work/a50_utils/./db_utils.py", line 382, in do_tracks
trackstats_df = ursastats.single_flightid_stats(df, object_type)
File "/Users/kovar/work/a50_utils/../a50/tracklib/ursastats.py", line 242, in single_flightid_stats
stats['LandType'] = cover = geospatial.get_land_use_coords\
File "/Users/kovar/work/a50_utils/../a50/tracklib/geospatial.py", line 511, in get_land_use_coords
lulc = gh.nlcd_bycoords([(lon, lat)], years={"cover": [2019]})
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeohydro/pygeohydro.py", line 398, in nlcd_bycoords
nlcd = _NLCD(
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeohydro/pygeohydro.py", line 203, in __init__
self.wms = WMS(
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeoogc/pygeoogc.py", line 312, in __init__
self.validate_wms()
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeoogc/core.py", line 533, in validate_wms
wms = WebMapService(self.url, version=self.version)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/owslib/wms.py", line 54, in WebMapService
return wms130.WebMapService_1_3_0(
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/owslib/map/wms130.py", line 75, in __init__
self._capabilities = reader.read(self.url, timeout=self.timeout)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/owslib/map/common.py", line 65, in read
u = openURL(spliturl[0], spliturl[1], method='Get',
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/owslib/util.py", line 208, in openURL
req = requests.request(method.upper(), url_base, headers=headers, **rkwargs)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.mrlc.gov', port=443): Read timed out. (read timeout=30)
2021-12-23 10:32:34.264 ERROR sqlalchemy.pool.impl.QueuePool: Exception during reset or similar
Traceback (most recent call last):
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 682, in _finalize_fairy
fairy._reset(pool)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 887, in _reset
pool._dialect.do_rollback(self)
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 667, in do_rollback
dbapi_connection.rollback()
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Update the package from GitHub and give it a try again. The request calls should be more performant and stable now.
I ran this again against a much larger dataset and did so with Jupyter. It produced a very large number of INFO warnings that mentioned 'chaos'. I unfortunately didn't preserve the error messages and cannot recreate them.
I've integrated the code into our app and it does what it should. Thanks again!
Report the error again with logs next time so I can figure out the issue. I have tested the code with large number of requests and as long as your system has enough memory, it should work fine.
This one may be due to my environment and a need to rebuild it.
File "/Users/kovar/work/a50_utils/../a50/tracklib/geospatial.py", line 23, in <module>
import pygeohydro as gh
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeohydro/__init__.py", line 11, in <module>
from .pygeohydro import (
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeohydro/pygeohydro.py", line 24, in <module>
from pynhd import NLDI, AGRBase, WaterData
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pynhd/__init__.py", line 2, in <module>
from .core import AGRBase
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pynhd/core.py", line 18, in <module>
from .exceptions import InvalidInputValue
File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pynhd/exceptions.py", line 7, in <module>
class ServiceError(ar.ServiceError):
AttributeError: module 'async_retriever' has no attribute 'ServiceError'
You haven't updated all the packages from GitHub. You need to run the same pip
command that I mentioned earlier to update all of them. The next versions of HyRiver packages will be a minor release (0.11.x to 0.12.x) as there are some breaking changes. So for now, you need install all of them from GitHub.
I did:
pip install git+https://github.com/cheginit/async_retriever.git git+https://github.com/cheginit/pygeoogc.git git+https://github.com/cheginit/pygeoutils.git git+https://github.com/cheginit/pynhd.git git+https://github.com/cheginit/pygeohydro.git
and the issue persists. Did I miss a package?
I see. In that case it should work. I just gave it a try in a new env and it works. Create a new Python environment and give it a try.
Here's what I did. Created a new env using mamba
(or conda
) with rasterio
and fiona
installed then used the same pip
command.
mamba create -n hyriver rasterio fiona
conda activate hyriver
pip install git+https://github.com/cheginit/async_retriever.git git+https://github.com/cheginit/pygeoogc.git git+https://github.com/cheginit/pygeoutils.git git+https://github.com/cheginit/pynhd.git git+https://github.com/cheginit/pygeohydro.git
I did a pip uninstall
on everything related to the stack and then reinstalled as above. The problem reappeared when I installed pygeos and that brought in async-timeout, async_generator, async_retriever, and pygeoogc.
Conda installed:
async_generator conda-forge/noarch::async_generator-1.10-py_0
async_retriever conda-forge/noarch::async_retriever-0.2.0-pyhd8ed1ab_0
pygeoogc conda-forge/noarch::pygeoogc-0.11.7-pyhd8ed1ab_0
pygeos conda-forge/osx-arm64::pygeos-0.12.0-py39hc7d1406_0
pygeoutils conda-forge/noarch::pygeoutils-0.11.7-pyhd8ed1ab_0
I installed from git again, picked up: async-retriever 0.2.6.dev43+g0d211e0
, and still have the problem.
I'll try building a new environment now.
Some combination of a new environment, conda, and pip got me an environment with everything working but it took a few tries, even starting from scratch. pygeos
seemed to be part of the problem. I'll close the ticket out, I think your code is fine.
I'll release the new versions of the packages soon. If you still have issues with those new official versions re-open so I can take a look.
For an aviation use case - "Where did the drone launch?' - I'd like to get the land use for a lot of points in the U.S.
A similar use case that I use is "What is the elevation at a particular point?" To answer this, I run an https://open-elevation.com/ docker instance and use the API to pass in thousands of lat/lon pairs. It returns the pairs with the elevation and I take the elevation and add it as a column to my dataframe.
A good solution would be a function that operated on a local copy of the NLCD database, took lat/lon pairs, and returned the text description of the land use. The pairs could be distinct lat / lon values or possibly a geodataframe with one or more points.
If this function was vectorized and could process large numbers of points quickly that would be a bonus but not necessary.
The best option I could come up with involved setting a bounding box around each point. (See Discussions for details.)