mojodna / marblecutter-virtual

Virtual catalogs for marblecutter
Other
48 stars 30 forks source link

cached filename used after network errors #12

Open Zaitsev opened 5 years ago

Zaitsev commented 5 years ago

If network error occurred during querying remote geoTiff (no s3 creds needed), marble-virtual uses cache instead of querying file again

server start: (0.1 timeout for raise Curl timeout errors) GDAL_HTTP_TIMEOUT=0.1 CPL_CURL_VERBOSE=YES python ./server.py

try to get twice curl http://localhost:8000/tiles/12/1183/1468?url=https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff&rgb=4,5,6 > /dev/null

on the first query I see Curl requests, and good exception

2019-06-19 14:15:02,778.778 ERROR virtual.web 5412 : CURL error: Connection time-out| web.py:64

on any further queries no Curl requests made, instead marble-virtual tries to open cached file

2019-06-19 14:15:06,050.050 ERROR virtual.web 8684 : '/vsicurl/https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff' does not exist in the file system, and is not recognized as a supported dataset name.| web.py:64

I've traced it to magblecutter/init.py

def get_source(path):
    """Cached source opening."""
    with rasterio.Env():
        return rasterio.open(path)

Is there any method to reset caches? I've tried gdal.VSIErrorReset() gdal.VSICurlClearCache() with no success, seems rasterio cached something somewhere

log :

 * Serving Flask app "marblecutter-virtual" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
2019-06-19 14:14:57,294.294  INFO     werkzeug   528        :  * Running on http://0.0.0.0:8000/ (Press CTRL+C to quit)| _internal.py:122
2019-06-19 14:14:57,295.295  INFO     werkzeug   529        :  * Restarting with stat| _internal.py:122
2019-06-19 14:14:57,897.897  WARNING  werkzeug   531        :  * Debugger is active!| _internal.py:122
2019-06-19 14:14:57,898.898  INFO     werkzeug   532        :  * Debugger PIN: 888-447-794| _internal.py:122
2019-06-19 14:15:02,539.539  DEBUG    virtual.web 5173       : ------- WEBPY https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff -----------------------------------| web.py:46
2019-06-19 14:15:02,544.544  DEBUG    rasterio._base 5178       : Sharing flag: 32| __init__.py:216
* Couldn't find host f.tiles.utility.opt-net.eu in the .netrc file; using defaults
* Connection time-out
* Closing connection 0
* Couldn't find host f.tiles.utility.opt-net.eu in the .netrc file; using defaults
* Hostname f.tiles.utility.opt-net.eu was found in DNS cache
*   Trying 185.178.87.64...
* TCP_NODELAY set
* Connected to f.tiles.utility.opt-net.eu (185.178.87.64) port 443 (#1)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* Operation timed out after 100 milliseconds with 0 out of 0 bytes received
* stopped the pause stream!
* Closing connection 1
None
2019-06-19 14:15:02,778.778  ERROR    virtual.web 5412       : ============2019-06-19 14:15:02.778293| web.py:60
2019-06-19 14:15:02,778.778  ERROR    virtual.web 5412       : CURL error: Connection time-out| web.py:64
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_HttpResponseError: CURL error: Connection time-out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/c/VLZ/oDesk/taras/src/IBM/kepler/kepler-tiles/marblecutter-virtual/virtual/web.py", line 55, in make_catalog
    resample=args.get("resample"),
  File "/c/VLZ/oDesk/taras/src/IBM/kepler/kepler-tiles/marblecutter-virtual/virtual/catalogs.py", line 42, in __init__
    with get_source(self._uri) as src:
  File "/home/vlz/miniconda3/envs/marble/lib/python3.6/site-packages/marblecutter/__init__.py", line 149, in get_source
    return rasterio.open(path)
  File "/home/vlz/miniconda3/envs/marble/lib/python3.6/site-packages/rasterio/env.py", line 430, in wrapper
    return f(*args, **kwds)
  File "/home/vlz/miniconda3/envs/marble/lib/python3.6/site-packages/rasterio/__init__.py", line 216, in open
    s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  File "rasterio/_base.pyx", line 218, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: CURL error: Connection time-out
2019-06-19 14:15:02,786.786  INFO     werkzeug   5420       : 127.0.0.1 - - [19/Jun/2019 14:15:02] "GET /tiles/12/1183/1468?url=https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff&rgb=4,5,6& HTTP/1.1" 404 -| _internal.py:122
2019-06-19 14:15:06,047.047  DEBUG    virtual.web 8681       : ------- WEBPY https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff -----------------------------------| web.py:46
2019-06-19 14:15:06,048.048  DEBUG    rasterio._base 8682       : Sharing flag: 32| __init__.py:216
None
2019-06-19 14:15:06,049.049  ERROR    virtual.web 8683       : ============2019-06-19 14:15:06.049726| web.py:60
2019-06-19 14:15:06,050.050  ERROR    virtual.web 8684       : '/vsicurl/https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff' does not exist in the file system, and is not recognized as a supported dataset name.| web.py:64
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsicurl/https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff' does not exist in the file system, and is not recognized as a supported dataset name.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/c/VLZ/oDesk/taras/src/IBM/kepler/kepler-tiles/marblecutter-virtual/virtual/web.py", line 55, in make_catalog
    resample=args.get("resample"),
  File "/c/VLZ/oDesk/taras/src/IBM/kepler/kepler-tiles/marblecutter-virtual/virtual/catalogs.py", line 42, in __init__
    with get_source(self._uri) as src:
  File "/home/vlz/miniconda3/envs/marble/lib/python3.6/site-packages/marblecutter/__init__.py", line 149, in get_source
    return rasterio.open(path)
  File "/home/vlz/miniconda3/envs/marble/lib/python3.6/site-packages/rasterio/env.py", line 430, in wrapper
    return f(*args, **kwds)
  File "/home/vlz/miniconda3/envs/marble/lib/python3.6/site-packages/rasterio/__init__.py", line 216, in open
    s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  File "rasterio/_base.pyx", line 218, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: '/vsicurl/https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff' does not exist in the file system, and is not recognized as a supported dataset name.
2019-06-19 14:15:06,059.059  INFO     werkzeug   8693       : 127.0.0.1 - - [19/Jun/2019 14:15:06] "GET /tiles/12/1183/1468?url=https://f.tiles.utility.opt-net.eu/tiles/canada-26May2019-15May2019-1/canada-26May2019-15May2019-1-LZW.tiff&rgb=4,5,6& HTTP/1.1" 404 -| _internal.py:122
mojodna commented 5 years ago

This may be related to https://rasterio.groups.io/g/main/topic/31734579, but I think your intuition is correct on this being incorrectly cached within marblecutter (it should be invalidated if any exceptions are encountered while reading, but I'm not sure how yet).

Zaitsev commented 5 years ago

following links I finished with https://github.com/OSGeo/gdal/issues/1244#issuecomment-487164897 solution Good news: setting CPL_VSIL_CURL_NON_CACHED="/vsicurl/" fixes issue Bad news: geoTiff parsed on each tile request so each next tile request time it the same as the first tile request, with cache it at least 2 times faster Searched rasterio codes for cache to find a way to reset cache with no success.... I've tried to change query-string and add &time= but it doesn't help