smnorris / bcdata

Python and command line tools for quick access to DataBC geo-data available via WFS/WCS.
MIT License
29 stars 7 forks source link

bc2pg fails accessing bc-environmental-monitoring-locations due to UnboundLocalError #146

Closed NathanEne closed 9 months ago

NathanEne commented 9 months ago

When running the following command: bcdata bc2pg bc-environmental-monitoring-locations using --db_url,--schema, --table options the following error is returned:

Traceback (most recent call last):
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/bcdata/bc2pg.py", line 43, in _download
    return data
UnboundLocalError: local variable 'data' referenced before assignment

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/venvs/maintenance/bin/bcdata", line 8, in <module>
    sys.exit(cli())
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 1126, in __call__
    return self.main(*args, **kwargs)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 1051, in main
    rv = self.invoke(ctx)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 1393, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 752, in invoke
    return __callback(*args, **kwargs)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/bcdata/cli.py", line 378, in bc2pg
    out_table = bcdata.bc2pg(
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/bcdata/bc2pg.py", line 144, in bc2pg
    df = _download(url)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7fe5f9c4d660 state=finished raised UnboundLocalError>]

this issue is encountered on version 0.8.1.post1:

$ bcdata --version
0.8.1.post1
smnorris commented 9 months ago

I expect this error is due to requests timing out. Testing now and the service is very sluggish and I get the same error.

https://github.com/smnorris/bcdata/issues/143 notes how error messages need to be more informative when requests time out.

smnorris commented 9 months ago

I'm not sure what is going on, requests for other layers with >100k records currently work fine.

But this one (only ~40k records) is very slow, and consistently times out / returns a 502 error on the second request. Is there anything unique about this table?

$ bcdata cat WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW -vv > /dev/null
2023-10-11 15:11:35,036:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,110:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,192:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,263:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,341:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,399:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,489:DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): openmaps.gov.bc.ca:443
2023-10-11 15:11:37,855:DEBUG:urllib3.connectionpool: https://openmaps.gov.bc.ca:443 "GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&resultType=hits&outputFormat=json HTTP/1.1" 200 None
2023-10-11 15:11:37,861:DEBUG:bcdata.wfs: https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&resultType=hits&outputFormat=json
2023-10-11 15:11:37,862:DEBUG:bcdata.wfs: {'Date': 'Wed, 11 Oct 2023 22:11:36 GMT', 'Server': 'Apache', 'X-Frame-Options': 'allow-from (null)', 'X-Control-flow-delay-ms': '0', 'Content-Disposition': 'inline; filename=geoserver-GetFeature.text', 'Content-Type': 'text/xml', 'Set-Cookie': 'GS_FLOW_CONTROL=GS_CFLOW_6a64218f:18b20ca91ea:-8a4', 'Vary': 'Accept-Encoding', 'Access-Control-Allow-Origin': '(null)', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'POST, GET, OPTIONS, HEAD', 'Access-Control-Allow-Headers': 'X-Requested-With, Referer, Origin, Content-Type, SOAPAction, Authorization, Accept', 'Access-Control-Max-Age': '1000', 'X-Geo-Cache': 'miss', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Transfer-Encoding': 'chunked'}
2023-10-11 15:11:37,866:INFO:bcdata.wfs: {'start_time': 0.911322208, 'attempt_number': 1, 'idle_for': 0}
2023-10-11 15:11:37,866:INFO:bcdata.wfs: Total features requested: 41452
2023-10-11 15:11:37,874:DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): openmaps.gov.bc.ca:443
2023-10-11 15:11:42,797:DEBUG:urllib3.connectionpool: https://openmaps.gov.bc.ca:443 "GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=0&count=10000 HTTP/1.1" 200 None
2023-10-11 15:14:07,092:INFO:bcdata.wfs: https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=0&count=10000
2023-10-11 15:14:07,093:DEBUG:bcdata.wfs: {'Date': 'Wed, 11 Oct 2023 22:11:42 GMT', 'Server': 'Apache', 'X-Frame-Options': 'allow-from (null)', 'X-Control-flow-delay-ms': '0', 'Content-Disposition': 'inline; filename=geoserver-GetFeature.application', 'Content-Type': 'application/json;charset=UTF-8', 'Set-Cookie': 'GS_FLOW_CONTROL=GS_CFLOW_4d2859f9:18b20caa96d:-47af', 'Access-Control-Allow-Origin': '(null)', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'POST, GET, OPTIONS, HEAD', 'Access-Control-Allow-Headers': 'X-Requested-With, Referer, Origin, Content-Type, SOAPAction, Authorization, Accept', 'Access-Control-Max-Age': '1000', 'X-Geo-Cache': 'miss', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Transfer-Encoding': 'chunked'}
2023-10-11 15:14:07,337:INFO:bcdata.wfs: {'start_time': 3.300790125, 'attempt_number': 1, 'idle_for': 0}
2023-10-11 15:14:07,340:DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): openmaps.gov.bc.ca:443
2023-10-11 15:15:07,887:DEBUG:urllib3.connectionpool: https://openmaps.gov.bc.ca:443 "GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000 HTTP/1.1" 502 341
2023-10-11 15:15:07,900:INFO:bcdata.wfs: https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000
2023-10-11 15:15:07,900:DEBUG:bcdata.wfs: {'Date': 'Wed, 11 Oct 2023 22:14:07 GMT', 'Server': 'Apache', 'Content-Length': '341', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=iso-8859-1'}
2023-10-11 15:15:07,900:DEBUG:bcdata.wfs: WFS/network error
2023-10-11 15:15:08,307:DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): openmaps.gov.bc.ca:443
2023-10-11 15:16:08,898:DEBUG:urllib3.connectionpool: https://openmaps.gov.bc.ca:443 "GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000 HTTP/1.1" 502 341
2023-10-11 15:16:08,905:INFO:bcdata.wfs: https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000
2023-10-11 15:16:08,905:DEBUG:bcdata.wfs: {'Date': 'Wed, 11 Oct 2023 22:15:08 GMT', 'Server': 'Apache', 'Content-Length': '341', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=iso-8859-1'}
2023-10-11 15:16:08,905:DEBUG:bcdata.wfs: WFS/network error
Traceback (most recent call last):
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/Users/snorris/Projects/repo/bcdata/bcdata/wfs.py", line 171, in _request_features
    return features
UnboundLocalError: local variable 'features' referenced before assignment

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/bin/bcdata", line 8, in <module>
    sys.exit(cli())
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/snorris/Projects/repo/bcdata/bcdata/cli.py", line 278, in cat
    for feat in bcdata.get_features(
  File "/Users/snorris/Projects/repo/bcdata/bcdata/wfs.py", line 440, in get_features
    for feature in self._request_features(url):
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x132273dc0 state=finished raised UnboundLocalError>]
smnorris commented 9 months ago

Downloading manually as gpkg via the download service is near instant, all 41452 records returned, all are points. It is a simple dataset - I don't know what the issue is.

$ ogrinfo EMS_MONITORING_LOCN_TYPES_SVW.gpkg WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW -so
INFO: Open of `EMS_MONITORING_LOCN_TYPES_SVW.gpkg'
      using driver `GPKG' successful.

Layer name: WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW
Geometry: Point
Feature Count: 41452
smnorris commented 9 months ago

same result with curl:

$ curl -v "https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000"
*   Trying 142.34.140.41:443...
* Connected to openmaps.gov.bc.ca (142.34.140.41) port 443 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: C=CA; ST=British Columbia; L=Victoria; O=Government of the Province of British Columbia; CN=*.data.gov.bc.ca
*  start date: Sep 28 22:33:08 2022 GMT
*  expire date: Oct 16 22:33:08 2023 GMT
*  subjectAltName: host "openmaps.gov.bc.ca" matched cert's "openmaps.gov.bc.ca"
*  issuer: C=US; O=Entrust, Inc.; OU=See www.entrust.net/legal-terms; OU=(c) 2012 Entrust, Inc. - for authorized use only; CN=Entrust Certification Authority - L1K
*  SSL certificate verify ok.
* using HTTP/1.1
> GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000 HTTP/1.1
> Host: openmaps.gov.bc.ca
> User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 502 Proxy Error
< Date: Wed, 11 Oct 2023 22:33:25 GMT
< Server: Apache
< Content-Length: 341
< Connection: close
< Content-Type: text/html; charset=iso-8859-1
<
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
* Closing connection 0
smnorris commented 9 months ago

Sorting by MONITORING_LOCATION_ID resolves the issue.

Because there is no interface to determine primary key, bcdata makes some guesses and will default to OBJECT_ID. Something weird seems to be going on at the server when trying to sort that dataset by OBJECT_ID.

This works: bcdata bc2pg --sortby monitoring_location_id bc-environmental-monitoring-locations

I'd recommend taking a look at the schema before trying downloads: bcdata info <dataset> --indent 2

There will generally (but definitely not always) be a unique key noted. When using bc2pg, you can also set that column as the primary key in the output table with the --primary_key option.

NathanEne commented 9 months ago

Thank you that worked excellently. I appreciate the quick response!