hyriver / pygeohydro

A part of HyRiver software stack for accessing hydrology data through web services
https://docs.hyriver.io
Other
68 stars 23 forks source link

NLCD by location #80

Closed kovar-ursa closed 2 years ago

kovar-ursa commented 2 years ago

For an aviation use case - "Where did the drone launch?' - I'd like to get the land use for a lot of points in the U.S.

A similar use case that I use is "What is the elevation at a particular point?" To answer this, I run an https://open-elevation.com/ docker instance and use the API to pass in thousands of lat/lon pairs. It returns the pairs with the elevation and I take the elevation and add it as a column to my dataframe.

A good solution would be a function that operated on a local copy of the NLCD database, took lat/lon pairs, and returned the text description of the land use. The pairs could be distinct lat / lon values or possibly a geodataframe with one or more points.

If this function was vectorized and could process large numbers of points quickly that would be a bonus but not necessary.

The best option I could come up with involved setting a bounding box around each point. (See Discussions for details.)

cheginit commented 2 years ago

Thanks for the suggestion!

Regarding the elevation, I would highly recommend using Py3DEP as it's trivial to get elevations of thousands of points via either its CLI or using it as a library. Two neat features of all HyRiver packages are asynchronous requests and their usage of persistent caching. With persistent caching all requests/responses are stored in an SQL database so repeated requests, under-the-hood, are read from the database.

About your NLCD suggestion, the main idea behind HyRiver is to avoid using local copies of databases by relying on features such as server-side operations and persistent caching. I'll work on a suitable solution.

kovar-ursa commented 2 years ago

If we can get everything we need in one place, with a minimum of server load, that'd be superb. We need elevation and land use at the moment. I suspect that, once this works, we can find value in other aspects of the data set. PyDaymet may be useful, for example.

cheginit commented 2 years ago

I implemented two new functions one of which is for getting data based on coordinates. Take a look at this example notebook. You can give it a try by installing the dependencies from GitHub until I release an official new version. You can do so using pip like this:

pip install git+https://github.com/cheginit/async_retriever.git git+https://github.com/cheginit/pygeoogc.git git+https://github.com/cheginit/pygeoutils.git git+https://github.com/cheginit/pynhd.git git+https://github.com/cheginit/pygeohydro.git
kovar-ursa commented 2 years ago

Looks good!

import osmnx as ox
import pygeohydro as gh
from pynhd import NLDI

# geometry = NLDI().get_basins(["01031450", "01031500", "01031510"])
lulc = gh.nlcd_bygeom(geometry, 100, years={"cover": [2016, 2019]})

coords = [(42.69513, -71.030437),
 (42.694901, -71.027653),
 (42.695388, -71.026931),
 (42.695383, -71.026942),
 (42.696471, -71.023837),
 (42.697136, -71.023545),
 (42.699233, -71.024387),
 (42.698356, -71.021488),
 (42.696643, -71.023499),
 (42.694305, -71.030054),
 (42.693343, -71.03474),
 (42.693349, -71.034757),
 (42.694002, -71.035491),
 (42.693452, -71.033743)]

x, y = zip(*coords)
lulc = gh.nlcd_bycoords(list(zip(y, x)), years={"cover": [2019], "canopy": [2016]})

lulc

Produces:

0   POINT (-71.03044 42.69513)  83.0    43
1   POINT (-71.02765 42.69490)  83.0    90
2   POINT (-71.02693 42.69539)  61.0    90
3   POINT (-71.02694 42.69538)  61.0    90
4   POINT (-71.02384 42.69647)  65.0    90
5   POINT (-71.02354 42.69714)  66.0    90
6   POINT (-71.02439 42.69923)  70.0    90
7   POINT (-71.02149 42.69836)  65.0    90
8   POINT (-71.02350 42.69664)  64.0    90
9   POINT (-71.03005 42.69430)  67.0    41
10  POINT (-71.03474 42.69334)  86.0    43
11  POINT (-71.03476 42.69335)  86.0    43
12  POINT (-71.03549 42.69400)  79.0    43
13  POINT (-71.03374 42.69345)  52.0    21

Two tangential questions, if I may:

1) Since I am using nlcd_bycoords, I do not need to load the watershed geometries, correct? 2) I need string values for the cover. Is this the appropriate mapping from number to text? - https://www.mrlc.gov/data/legends/national-land-cover-database-2019-nlcd2019-legend

Thank you -very- much, this is a huge help.

cheginit commented 2 years ago
  1. No, you only need (lon, lat) coordinates.
  2. Yes, mappings are based on official values. You can use gh.helpers.nlcd_helper()["classes"] to get classes and their description.

Sure!

kovar-ursa commented 2 years ago

I ran this again against a much larger dataset and did so with Jupyter. It produced a very large number of INFO warnings that mentioned 'chaos'. I unfortunately didn't preserve the error messages and cannot recreate them.

I've integrated the code into our app and it does what it should. Thanks again!

kovar-ursa commented 2 years ago

Is there a way to increase the timeout period?


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.mrlc.gov', port=443): Read timed out. (read timeout=30)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/kovar/work/a50_utils/./db_utils.py", line 497, in <module>
    main()
  File "/Users/kovar/work/a50_utils/./db_utils.py", line 490, in main
    upload_as(engine, args.file, args.tz, args.wxdir, args.a50)
  File "/Users/kovar/work/a50_utils/./db_utils.py", line 218, in upload_as
    do_objects(s, engine, df, 'Home', wxclear)
  File "/Users/kovar/work/a50_utils/./db_utils.py", line 278, in do_objects
    do_tracks(s, engine, df, object_type, objectID[0], wxclear)
  File "/Users/kovar/work/a50_utils/./db_utils.py", line 382, in do_tracks
    trackstats_df = ursastats.single_flightid_stats(df, object_type)
  File "/Users/kovar/work/a50_utils/../a50/tracklib/ursastats.py", line 242, in single_flightid_stats
    stats['LandType'] = cover = geospatial.get_land_use_coords\
  File "/Users/kovar/work/a50_utils/../a50/tracklib/geospatial.py", line 511, in get_land_use_coords
    lulc = gh.nlcd_bycoords([(lon, lat)], years={"cover": [2019]})
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeohydro/pygeohydro.py", line 398, in nlcd_bycoords
    nlcd = _NLCD(
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeohydro/pygeohydro.py", line 203, in __init__
    self.wms = WMS(
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeoogc/pygeoogc.py", line 312, in __init__
    self.validate_wms()
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeoogc/core.py", line 533, in validate_wms
    wms = WebMapService(self.url, version=self.version)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/owslib/wms.py", line 54, in WebMapService
    return wms130.WebMapService_1_3_0(
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/owslib/map/wms130.py", line 75, in __init__
    self._capabilities = reader.read(self.url, timeout=self.timeout)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/owslib/map/common.py", line 65, in read
    u = openURL(spliturl[0], spliturl[1], method='Get',
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/owslib/util.py", line 208, in openURL
    req = requests.request(method.upper(), url_base, headers=headers, **rkwargs)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.mrlc.gov', port=443): Read timed out. (read timeout=30)
2021-12-23 10:32:34.264 ERROR   sqlalchemy.pool.impl.QueuePool: Exception during reset or similar
Traceback (most recent call last):
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 682, in _finalize_fairy
    fairy._reset(pool)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 887, in _reset
    pool._dialect.do_rollback(self)
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 667, in do_rollback
    dbapi_connection.rollback()
psycopg2.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
cheginit commented 2 years ago

Update the package from GitHub and give it a try again. The request calls should be more performant and stable now.

cheginit commented 2 years ago

I ran this again against a much larger dataset and did so with Jupyter. It produced a very large number of INFO warnings that mentioned 'chaos'. I unfortunately didn't preserve the error messages and cannot recreate them.

I've integrated the code into our app and it does what it should. Thanks again!

Report the error again with logs next time so I can figure out the issue. I have tested the code with large number of requests and as long as your system has enough memory, it should work fine.

kovar-ursa commented 2 years ago

This one may be due to my environment and a need to rebuild it.

  File "/Users/kovar/work/a50_utils/../a50/tracklib/geospatial.py", line 23, in <module>
    import pygeohydro as gh
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeohydro/__init__.py", line 11, in <module>
    from .pygeohydro import (
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pygeohydro/pygeohydro.py", line 24, in <module>
    from pynhd import NLDI, AGRBase, WaterData
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pynhd/__init__.py", line 2, in <module>
    from .core import AGRBase
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pynhd/core.py", line 18, in <module>
    from .exceptions import InvalidInputValue
  File "/Users/kovar/miniforge3/envs/a50-dev/lib/python3.9/site-packages/pynhd/exceptions.py", line 7, in <module>
    class ServiceError(ar.ServiceError):
AttributeError: module 'async_retriever' has no attribute 'ServiceError'
cheginit commented 2 years ago

You haven't updated all the packages from GitHub. You need to run the same pip command that I mentioned earlier to update all of them. The next versions of HyRiver packages will be a minor release (0.11.x to 0.12.x) as there are some breaking changes. So for now, you need install all of them from GitHub.

kovar-ursa commented 2 years ago

I did:

pip install git+https://github.com/cheginit/async_retriever.git git+https://github.com/cheginit/pygeoogc.git git+https://github.com/cheginit/pygeoutils.git git+https://github.com/cheginit/pynhd.git git+https://github.com/cheginit/pygeohydro.git

and the issue persists. Did I miss a package?

cheginit commented 2 years ago

I see. In that case it should work. I just gave it a try in a new env and it works. Create a new Python environment and give it a try.

cheginit commented 2 years ago

Here's what I did. Created a new env using mamba (or conda) with rasterio and fiona installed then used the same pip command.

mamba create -n hyriver rasterio fiona
conda activate hyriver
pip install git+https://github.com/cheginit/async_retriever.git git+https://github.com/cheginit/pygeoogc.git git+https://github.com/cheginit/pygeoutils.git git+https://github.com/cheginit/pynhd.git git+https://github.com/cheginit/pygeohydro.git
kovar-ursa commented 2 years ago

I did a pip uninstall on everything related to the stack and then reinstalled as above. The problem reappeared when I installed pygeos and that brought in async-timeout, async_generator, async_retriever, and pygeoogc.

Conda installed:

  async_generator    conda-forge/noarch::async_generator-1.10-py_0
  async_retriever    conda-forge/noarch::async_retriever-0.2.0-pyhd8ed1ab_0
  pygeoogc           conda-forge/noarch::pygeoogc-0.11.7-pyhd8ed1ab_0
  pygeos             conda-forge/osx-arm64::pygeos-0.12.0-py39hc7d1406_0
  pygeoutils         conda-forge/noarch::pygeoutils-0.11.7-pyhd8ed1ab_0

I installed from git again, picked up: async-retriever 0.2.6.dev43+g0d211e0, and still have the problem.

I'll try building a new environment now.

kovar-ursa commented 2 years ago

Some combination of a new environment, conda, and pip got me an environment with everything working but it took a few tries, even starting from scratch. pygeos seemed to be part of the problem. I'll close the ticket out, I think your code is fine.

cheginit commented 2 years ago

I'll release the new versions of the packages soon. If you still have issues with those new official versions re-open so I can take a look.