panodata / dwdweather2

Python client to access weather data from Deutscher Wetterdienst (DWD), the federal meteorological service in Germany.
https://community.panodata.org/t/dwdweather2-a-python-client-to-access-weather-data-from-deutscher-wetterdienst-dwd/98
MIT License
72 stars 13 forks source link

Running on Windows fails silently due to certificate verification problem #21

Closed bakunin75 closed 4 years ago

bakunin75 commented 4 years ago

Trying to get dwdweather 0.11.1 running on Python 3.7.4.

Problem: Issuing the command dwdweather weather 02667 20190717T11 --resolution hourly --categories air_temperature leads to 2020-01-07 16:33:48,803 [dwdweather.client ] INFO : Requesting https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent 2020-01-07 16:33:48,928 [dwdweather.client ] WARNING: Station "2667" has no data for category "air_temperature"

The issue is also present when importing DwdWeather in python using the minimal example in the readme.

I tracked the problem to the client.py. For some reason the find_resource_file function in get_measurements gets stuck in the try block, but there is no Error raised. https://github.com/panodata/dwdweather2/blob/37426b85f89babd7e5b6e7c3f5d0c69ad0ea12e4/dwdweather/client.py#L129

amotl commented 4 years ago

Dear @bakunin75,

thanks for writing in.

We have been running dwdweather2 on Python 3.7.4 and it worked well so far, see https://github.com/panodata/dwdweather2/issues/7#issuecomment-549464844. Invoking the command you outlined above gives us:

$ dwdweather weather 02667 20190717T11 --resolution hourly --categories air_temperature
2020-01-07 23:00:43,306 [dwdweather.client   ] INFO   : Acquiring dataset for resolution "hourly" from "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly"
2020-01-07 23:00:43,309 [dwdweather.core     ] INFO   : Using cache database /Users/amo/.dwd-weather/dwdweather2.db
2020-01-07 23:00:43,310 [dwdweather.commands ] INFO   : Querying data for station "2667" and categories "['air_temperature']" at "2019-07-17 11:00:00"
2020-01-07 23:00:43,315 [dwdweather.core     ] INFO   : Downloading measurements for station 2667 and timeranges ['recent']
2020-01-07 23:00:43,315 [dwdweather.core     ] INFO   : Station information: null
2020-01-07 23:00:43,315 [dwdweather.core     ] INFO   : Downloading "air temperature" data (TU)
2020-01-07 23:00:43,315 [dwdweather.client   ] INFO   : Requesting https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent
2020-01-07 23:00:43,541 [dwdweather.client   ] INFO   : Fetching resource https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/stundenwerte_TU_02667_akt.zip
2020-01-07 23:00:43,590 [dwdweather.client   ] INFO   : Reading from Zip: produkt_tu_stunde_20180706_20200106_02667.txt
2020-01-07 23:00:43,595 [dwdweather.core     ] INFO   : Importing measurements for station "2667" and category "{'key': 'TU', 'name': 'air_temperature'}"
2020-01-07 23:00:43,595 [dwdweather.core     ] INFO   : Importing "air temperature" data from "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/stundenwerte_TU_02667_akt.zip/produkt_tu_stunde_20180706_20200106_02667.txt"
100%|██████████████████████████████████| 13202/13202 [00:04<00:00, 2701.92it/s]
{
    "airtemp_humidity": 68.0,
    "airtemp_quality_level": 3,
    "airtemp_temperature": 17.7,
    "cloudiness_quality_level": null,
    "cloudiness_source": null,
    "cloudiness_total_cover": null,
    "datetime": 2019071711,
    "precipitation_fallen": null,
    "precipitation_form": null,
    "precipitation_height": null,
    "precipitation_quality_level": null,
    "pressure_normalized": null,
    "pressure_quality_level": null,
    "pressure_station": null,
    "soiltemp_quality_level": null,
    "soiltemp_temperature_002": null,
    "soiltemp_temperature_005": null,
    "soiltemp_temperature_010": null,
    "soiltemp_temperature_020": null,
    "soiltemp_temperature_050": null,
    "soiltemp_temperature_100": null,
    "solar_atmosphere": null,
    "solar_duration": null,
    "solar_end_of_interval": null,
    "solar_global": null,
    "solar_quality_level": null,
    "solar_sky": null,
    "solar_zenith": null,
    "station_id": 2667,
    "sun_duration": null,
    "sun_quality_level": null,
    "visibility_quality_level": null,
    "visibility_source": null,
    "visibility_value": null,
    "wind_direction": null,
    "wind_quality_level": null,
    "wind_speed": null
}

You might want to add the --reset-cache option or drop the cache database manually in order to check if this has anything to do with.

With kind regards, Andreas.

amotl commented 4 years ago

I tracked the problem to the client.py. For some reason the find_resource_file function in get_measurements gets stuck in the try block, but there is no Error raised.

This observation could also indicate there might be network connectivity problems?

bakunin75 commented 4 years ago

Thanks for the replies. I'm curious, which OS are you running on?

Upon further investigation, I tracked the problem to the following call: response = self.http.get(uri + u'/')

I created a minimal example (windows).

from requests_cache import CachedSession
from bs4 import BeautifulSoup
import os

APP_NAME = "dwdweather2"
APP_VERSION = "0.11.1"

cache_name = os.path.join(os.getenv("APPDATA"),"dwdcache", "dwd_cache")

http = CachedSession(
    backend="sqlite",
    cache_name=cache_name,
    expire_after=300,
    user_agent=APP_NAME + "/" + APP_VERSION,
)

baseurl = "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent"
extension = "zip"
response = http.get(baseurl + u"/",verify=True)
content = response.content

soup = BeautifulSoup(content, "html.parser")
ret_list = [
    baseurl + "/" + node.get("href")
    for node in soup.find_all("a")
    if node.get("href").endswith(extension)
]

This one throws OpenSSL.SSL.Error: [('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')] When setting verify=False (which you wouldn't want) the requests runs through and prints the expected list of zip files.

amotl commented 4 years ago

I'm curious, which OS are you running on?

I am running macOS 10.13.6.

certificate verify failed

Strange thing. Maybe some CA certificates are not properly installed on your machine or it is really about connectivity woes on your side. Will you be able to check using a different internet uplink if you get the chance to?

When setting verify=False (which you wouldn't want) the requests runs through.

If nothing helps for you or other users on Windows, I might actually consider this if nobody objects to it.

bakunin75 commented 4 years ago

Strange thing. Maybe some CA certificates are not properly installed on your machine or it is really about connectivity woes on your side. Will you be able to check using a different internet uplink if you get the chance to?

I will try that, but probably don't get the chance until the weekend. It's possible that some company firewall/proxy or whatever is causing this problem.

If not this issue may be linked to https://github.com/pyca/pyopenssl/issues/823 (see first reply), but I can't work that into my minimal example. I wonder if anyone has ever run this module successfully on windows before?

bakunin75 commented 4 years ago

Btw the dwdbulk package mentioned in https://github.com/panodata/dwdweather2/issues/22#issue-546569328 suffers from the same SSL issue on windows (which is not supported anyways).

I've tested both modules on a linux VM and both work fine. So probably won't investigate further on the windows front and just stick to linux..

amotl commented 4 years ago

Dear @bakunin75,

thanks for your answer. While I was thinking about closing this issue, I believe we should keep it open for a while. This error really should not silently swallow the issue with certificate verification.

For all others running the same thing: As requests_cache's CachedSession does not accept the verify argument, you might want to set it at runtime within DwdCdcClient.setup_cache like

self.http.verify = False

in order to work around that problem.

With kind regards, Andreas.

amotl commented 4 years ago

So probably won't investigate further on the windows front and just stick to Linux.

All right, thanks!