hyriver / hyriver.github.io

A Python software stack for retrieving hydroclimate data from web services.
https://docs.hyriver.io
Other
94 stars 11 forks source link

Issue on page /examples/notebooks/hymod_calibration.html #16

Closed monicasantamaria closed 1 year ago

monicasantamaria commented 1 year ago

Hi! I'm trying to run the notebook example that uses HYMOD and HyRiver packages for hydrological modelling. I changed the station ID to another station of my interest, and I changed the dates for a longer period than the one in the example. I'm getting the following error when I try to compute the PET using the method hargreaves_samani:

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
Cell In [14], line 1
----> 1 clm = daymet.get_bygeom(basin.geometry[0], dates, variables="prcp", pet="hargreaves_samani")

File /usr/local/lib/python3.10/site-packages/pydaymet/pydaymet.py:467, in get_bygeom(geometry, dates, crs, variables, region, time_scale, pet, pet_params, snow, snow_params, ssl)
    453 urls, kwds = zip(
    454     *_gridded_urls(
    455         daymet.time_codes[time_scale],
   (...)
    460     )
    461 )
    463 try:
    464     clm: xr.Dataset = xr.open_mfdataset(
    465         (  # type: ignore
    466             io.BytesIO(r)
--> 467             for r in ar.retrieve_binary(urls, request_kwds=kwds, max_workers=MAX_CONN, ssl=ssl)
    468         ),
    469         engine="scipy",
    470         coords="minimal",
    471     )
    472 except ValueError as ex:
    473     msg = (
    474         "The service did NOT process your request successfully. "
    475         + "Check your inputs and try again."
    476     )

File /usr/local/lib/python3.10/site-packages/async_retriever/async_retriever.py:496, in retrieve_binary(urls, request_kwds, request_method, max_workers, cache_name, timeout, expire_after, ssl, disable)
    454 def retrieve_binary(
    455     urls: Sequence[StrOrURL],
    456     request_kwds: Optional[Sequence[Dict[str, Any]]] = None,
   (...)
    463     disable: bool = False,
    464 ) -> List[bytes]:
    465     r"""Send async requests and get the response as ``bytes``.
    466 
    467     Parameters
   (...)
    494         List of responses in the order of input URLs.
    495     """
--> 496     resp: List[bytes] = retrieve(  # type: ignore
    497         urls,
    498         "binary",
    499         request_kwds,
    500         request_method,
    501         max_workers,
    502         cache_name,
    503         timeout,
    504         expire_after,
    505         ssl,
    506         disable,
    507     )
    508     return resp

File /usr/local/lib/python3.10/site-packages/async_retriever/async_retriever.py:246, in retrieve(urls, read_method, request_kwds, request_method, max_workers, cache_name, timeout, expire_after, ssl, disable)
    242 chunked_reqs = tlz.partition_all(max_workers, inp.url_kwds)
    244 results = (loop.run_until_complete(session(url_kwds=c)) for c in chunked_reqs)
--> 246 resp = [r for _, r in sorted(tlz.concat(results))]
    247 if new_loop:
    248     loop.close()

File /usr/local/lib/python3.10/site-packages/async_retriever/async_retriever.py:244, in <genexpr>(.0)
    230 session = tlz.partial(
    231     async_session,
    232     read=inp.read_method,
   (...)
    239     disable=disable,
    240 )
    242 chunked_reqs = tlz.partition_all(max_workers, inp.url_kwds)
--> 244 results = (loop.run_until_complete(session(url_kwds=c)) for c in chunked_reqs)
    246 resp = [r for _, r in sorted(tlz.concat(results))]
    247 if new_loop:

File /usr/local/lib/python3.10/site-packages/nest_asyncio.py:90, in _patch_loop.<locals>.run_until_complete(self, future)
     87 if not f.done():
     88     raise RuntimeError(
     89         'Event loop stopped before Future completed.')
---> 90 return f.result()

File /usr/local/lib/python3.10/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception.with_traceback(self._exception_tb)
    202 return self._result

File /usr/local/lib/python3.10/asyncio/tasks.py:232, in Task.__step(***failed resolving arguments***)
    228 try:
    229     if exc is None:
    230         # We use the `send` method directly, because coroutines
    231         # don't have `__iter__` and `__next__` methods.
--> 232         result = coro.send(None)
    233     else:
    234         result = coro.throw(exc)

File /usr/local/lib/python3.10/site-packages/async_retriever/async_retriever.py:92, in async_session(url_kwds, read, r_kwds, request_method, cache_name, timeout, expire_after, ssl, disable)
     87 request_func = getattr(session, request_method.lower())
     88 tasks = (
     89     utils.retriever(uid, url, kwds, request_func, read, r_kwds)
     90     for uid, url, kwds in url_kwds
     91 )
---> 92 return await asyncio.gather(*tasks)

File /usr/local/lib/python3.10/asyncio/tasks.py:304, in Task.__wakeup(self, future)
    302 def __wakeup(self, future):
    303     try:
--> 304         future.result()
    305     except BaseException as exc:
    306         # This may also be a cancellation.
    307         self.__step(exc)

File /usr/local/lib/python3.10/asyncio/tasks.py:232, in Task.__step(***failed resolving arguments***)
    228 try:
    229     if exc is None:
    230         # We use the `send` method directly, because coroutines
    231         # don't have `__iter__` and `__next__` methods.
--> 232         result = coro.send(None)
    233     else:
    234         result = coro.throw(exc)

File /usr/local/lib/python3.10/site-packages/async_retriever/utils.py:62, in retriever(uid, url, s_kwds, session, read_type, r_kwds)
     30 async def retriever(
     31     uid: int,
     32     url: StrOrURL,
   (...)
     36     r_kwds: Dict[str, None],
     37 ) -> Tuple[int, Union[str, Awaitable[Union[str, bytes, Dict[str, Any]]]]]:
     38     """Create an async request and return the response as binary.
     39 
     40     Parameters
   (...)
     60         The retrieved response as binary.
     61     """
---> 62     async with session(url, **s_kwds) as response:
     63         try:
     64             return uid, await getattr(response, read_type)(**r_kwds)

File /usr/local/lib/python3.10/site-packages/aiohttp/client.py:1141, in _BaseRequestContextManager.__aenter__(self)
   1140 async def __aenter__(self) -> _RetType:
-> 1141     self._resp = await self._coro
   1142     return self._resp

File /usr/local/lib/python3.10/site-packages/forge/_revision.py:322, in Revision.__call__.<locals>.inner(*args, **kwargs)
    318 @functools.wraps(callable)
    319 async def inner(*args, **kwargs):
    320     # pylint: disable=E1102, not-callable
    321     mapped = inner.__mapper__(*args, **kwargs)
--> 322     return await callable(*mapped.args, **mapped.kwargs)

File /usr/local/lib/python3.10/site-packages/aiohttp_client_cache/session.py:62, in CacheMixin._request(self, method, str_or_url, expire_after, **kwargs)
     60 actions.update_from_response(new_response)
     61 if await self.cache.is_cacheable(new_response, actions):
---> 62     await self.cache.save_response(new_response, actions.key, actions.expires)
     63 return set_response_defaults(new_response)

File /usr/local/lib/python3.10/site-packages/aiohttp_client_cache/backends/base.py:169, in CacheBackend.save_response(self, response, cache_key, expires)
    167 cache_key = cache_key or self.create_key(response.method, response.url)
    168 cached_response = await CachedResponse.from_client_response(response, expires)
--> 169 await self.responses.write(cache_key, cached_response)
    171 # Alias any redirect requests to the same cache key
    172 for r in response.history:

File /usr/local/lib/python3.10/site-packages/aiohttp_client_cache/backends/sqlite.py:189, in SQLitePickleCache.write(self, key, item)
    188 async def write(self, key, item):
--> 189     await super().write(key, sqlite3.Binary(self.serialize(item)))

File /usr/local/lib/python3.10/site-packages/aiohttp_client_cache/backends/sqlite.py:170, in SQLiteCache.write(self, key, item)
    168 async def write(self, key: str, item: Union[ResponseOrKey, sqlite3.Binary]):
    169     async with self.get_connection(autocommit=True) as db:
--> 170         await db.execute(
    171             f'INSERT OR REPLACE INTO `{self.table_name}` (key,value) VALUES (?,?)',
    172             (key, item),
    173         )

File /usr/local/lib/python3.10/site-packages/aiosqlite/core.py:184, in Connection.execute(self, sql, parameters)
    182 if parameters is None:
    183     parameters = []
--> 184 cursor = await self._execute(self._conn.execute, sql, parameters)
    185 return Cursor(self, cursor)

File /usr/local/lib/python3.10/site-packages/aiosqlite/core.py:129, in Connection._execute(self, fn, *args, **kwargs)
    125 future = asyncio.get_event_loop().create_future()
    127 self._tx.put_nowait((future, function))
--> 129 return await future

File /usr/local/lib/python3.10/asyncio/futures.py:285, in Future.__await__(self)
    283 if not self.done():
    284     self._asyncio_future_blocking = True
--> 285     yield self  # This tells Task to wait for completion.
    286 if not self.done():
    287     raise RuntimeError("await wasn't used with future")

File /usr/local/lib/python3.10/asyncio/tasks.py:304, in Task.__wakeup(self, future)
    302 def __wakeup(self, future):
    303     try:
--> 304         future.result()
    305     except BaseException as exc:
    306         # This may also be a cancellation.
    307         self.__step(exc)

File /usr/local/lib/python3.10/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception.with_traceback(self._exception_tb)
    202 return self._result

File /usr/local/lib/python3.10/site-packages/aiosqlite/core.py:102, in Connection.run(self)
    100 try:
    101     LOG.debug("executing %s", function)
--> 102     result = function()
    103     LOG.debug("operation %s completed", function)
    105     def set_result(fut, result):

OperationalError: database is locked

My dates are defined as ("1993-01-01", "2015-12-31"), but I also tested with a shorter period to check if the database gets locked because of the amount of data I'm retrieving. But I got the same error. Do you have any suggestions on how to solve this error?

cheginit commented 1 year ago

This issue is related to a corrupt cache file. You can simply remove the cache folder in your current working directory and rerun.

monicasantamaria commented 1 year ago

I deleted the cache folder and rerun, but I get the same error because the folder is actually created while I'm running the code that produces the error:

clm = daymet.get_bygeom(basin.geometry[0], dates, variables="prcp", pet="hargreaves_samani")

I'm using Google colab, and I had to follow some tricks to install python 3.10 for HyRiver packages to work. Do you think this error may be related to it?

cheginit commented 1 year ago

That's strange. You're right that as soon as you run the code, that folder gets created. I never tried running it on Google Colab. Since you're using Google Colab you can just disable caching like so:

import os

os.environ["HYRIVER_CACHE_DISABLE"] = "true"

Just one note, you might run out of memory on Colab, since they have limited memory per instance and this example retrieves 10 years of gridded climate with all its variables.

I just tried running this example in Binder and the Daymet cell runs just fine, but during calibration the kernel dies since the memory usage exceeds the 2 GB limit on Binder instances.

monicasantamaria commented 1 year ago

Disabling caching actually worked, thank you!

I'm aware of the limited memory in Colab, but luckily I didn't run out of memory. However, I have another question. When retrieving the soil thickness, I'm getting a dependency error using the soil_gnatsgo function. I have already installed both pystac-client and planetary-computer packages, and I'm importing all modules since I didn't find in this or any other example which modules I should import:

from pystac_client import *

from planetary_computer import *

Yet, I get the following error:

ERROR:rasterio._filepath:File-like object not found in virtual filesystem: b'dc26ebe9-188e-4dff-8f3d-ef8d1561ee2f/dc26ebe9-188e-4dff-8f3d-ef8d1561ee2f.aux'
ERROR:rasterio._filepath:File-like object not found in virtual filesystem: b'dc26ebe9-188e-4dff-8f3d-ef8d1561ee2f/dc26ebe9-188e-4dff-8f3d-ef8d1561ee2f.AUX'
ERROR:rasterio._filepath:File-like object not found in virtual filesystem: b'dc26ebe9-188e-4dff-8f3d-ef8d1561ee2f/dc26ebe9-188e-4dff-8f3d-ef8d1561ee2f.aux'
ERROR:rasterio._filepath:File-like object not found in virtual filesystem: b'dc26ebe9-188e-4dff-8f3d-ef8d1561ee2f/dc26ebe9-188e-4dff-8f3d-ef8d1561ee2f.AUX'
---------------------------------------------------------------------------
DependencyError                           Traceback (most recent call last)
Cell In [38], line 9
      6 porosity = porosity.where(porosity > porosity.rio.nodata)
      7 porosity = porosity.rio.write_nodata(np.nan)
----> 9 thickness = gh.soil_gnatsgo("tk0_999a", geometry, crs).tk0_999a
     10 thickness = thickness.where(thickness < 2e6, drop=False) * 10
     11 thickness = thickness.rio.write_nodata(np.nan)

File /usr/local/lib/python3.10/site-packages/pygeohydro/pygeohydro.py:1129, in soil_gnatsgo(layers, geometry, crs)
   1105 """Get US soil data from the gNATSGO dataset.
   1106 
   1107 Notes
   (...)
   1126     Requested soil properties.
   1127 """
   1128 if NO_STAC:
-> 1129     raise DependencyError("get_soildata", ["pystac-client", "planetary-computer"])
   1131 catalog = pystac_client.Client.open(
   1132     "https://planetarycomputer.microsoft.com/api/stac/v1",
   1133     modifier=planetary_computer.sign_inplace,
   1134 )
   1135 bounds = geoutils.geo2polygon(geometry, crs, 4326).bounds

DependencyError: The following dependencies are missing for running get_soildata:
pystac-client, planetary-computer

Can you help me identify what I'm doing wrong here? Thanks!

cheginit commented 1 year ago

Great! It seems that colab has some issues with caching.

Regarding the dep error, my guess is that you installed them but didn't reload pygeohydro. Try re-importing pygeohydro or resetting the kernel, so it can detect installation of these two deps.

monicasantamaria commented 1 year ago

I imported them once again before running the soil_gnatsgo function and it worked. So I finally managed to run the whole notebook in Colab. Thank you!

Before closing this discussion, may I ask if you have plans to include in HyRiver a package to retrieve downscaled CMIP climate and hydrology projections data? https://gdo-dcp.ucllnl.org/downscaled_cmip_projections/

cheginit commented 1 year ago

I don't think I can do better than what's already out there. There are several downscaled CMIP6 datasets on Microsoft Planetary like GDDP that can be retrieved easily with stac and also Pangeo has several example notebooks on getting these types of data from their servers.

monicasantamaria commented 1 year ago

Oh, I didn't know about those. I will check them out. Thank you :)