panodata / dwdweather2

Python client to access weather data from Deutscher Wetterdienst (DWD), the federal meteorological service in Germany.
https://community.panodata.org/t/dwdweather2-a-python-client-to-access-weather-data-from-deutscher-wetterdienst-dwd/98
MIT License
72 stars 13 forks source link

DwdWeather: queried weather data isn't stored correctly in local db #18

Closed larsupb closed 4 years ago

larsupb commented 4 years ago

I have recurring queries on several stations, each time asking the same data. However, for some stations, the information seems not to be stored correctly. Instead of using local sqlite db, the data is queried from the ftp server.

dw = DwdWeather(resolution="hourly", categories='air_temperature')
dw.categories = [{'key': 'TU', 'name': 'air_temperature'}]

Stations i'm querying e.g.:

4275 Rotenburg (Wümme) 3667 Nürnberg-Netzstall

amotl commented 4 years ago

Dear @larsupb,

thanks for writing in. Let me rephrase your observations, please correct me if I am wrong. You are saying that

For some stations, air_temperature data is queried from the FTP server over and over again. It looks like it does not get stored into the sqlite database correctly.

Can you confirm that this is happening just for data from some of the stations and that it works for most of the others?

If so, we might want to have a look for potential issues with charset encoding within the codebase, as both of the stations you referenced have German umlauts within their names.

If you could investigate further based on these thoughts, I would be more than grateful, i.e. does it happen for all stations having names with special characters or not. Likewise, does it also happen for stations without umlauts?

Thanks already and with kind regards, Andreas.

amotl commented 4 years ago

Also, could it be that you are running into the cache thing? Do you experience your observations also on subsequent invocations following each other within a short timeframe?

If you have longer pauses between invocations, data might get re-fetched as old data might have become stale.

For the sake of completeness, can you also provide the full code, i.e. the datetime you are requesting data for?

larsupb commented 4 years ago
import pandas as pd
from dwdweather import DwdWeather
from datetime import datetime, timedelta 

def get_daily_temperatures(station_id, date_from, date_to):
    df_dates = pd.Series(data=pd.date_range(date_from, date_to, freq='D'))        
    return df_dates.apply(lambda q: dw.query(station_id, timestamp=q+timedelta(hours=12)))

dw = DwdWeather(resolution="hourly", categories='air_temperature')
dw.categories = [{'key': 'TU', 'name': 'air_temperature'}]

s = get_daily_temperatures(station_id=3284, date_from='2011-01-03 00:00:00', date_to='2019-11-27')
print(s)

This is a little snippet I've prepared to demonstrate the bug. I'm using dwdweather 0.10.0 cause I'm having some compatibility issues with other modules im using.

amotl commented 4 years ago

Thanks.

I'm using dwdweather 0.10.0 cause I'm having some compatibility issues with other modules im using.

You should really be using the current version. Let's relax the dependencies of dwdweather2 then. May I humbly ask you which specific modules have been involved into the incompatibility?

Actually, I can't see any differences between 0.10.0 and master here, see https://github.com/panodata/dwdweather2/compare/0.10.0...master and blame:setup.py.

amotl commented 4 years ago

Dear @larsupb,

we can confirm this invocation works for us:

dwdweather weather 3284 2011-01-03T00:00 --resolution=hourly --categories air_temperature

It yields

2020-01-07 23:16:04,318 [dwdweather.client   ] INFO   : Acquiring dataset for resolution "hourly" from "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly"
2020-01-07 23:16:04,323 [dwdweather.core     ] INFO   : Using cache database /Users/amo/.dwd-weather/dwdweather2.db
2020-01-07 23:16:04,324 [dwdweather.commands ] INFO   : Querying data for station "3284" and categories "['air_temperature']" at "2011-01-03 00:00:00"
{
    "airtemp_humidity": 87.0,
    "airtemp_quality_level": 3,
    "airtemp_temperature": -0.4,
    "cloudiness_quality_level": null,
    "cloudiness_source": null,
    "cloudiness_total_cover": null,
    "datetime": 2011010300,
    ...
}

Invoking

dwdweather weather 3284 2019-11-27T00:00 --resolution=hourly --categories air_temperature

works likewise.

Both commands seem to use the cache properly on 0.11.1. I really recommend upgrading to this version. May I ask you again which incompatibility issues you have been observing?

With kind regards, Andreas.

amotl commented 4 years ago

Dear @larsupb,

Wetterdienst will be the designated successor library for dwdweather2. As you did within your snippet already, the fact that Wetterdienst is fully based on Pandas might well spark your interest.

With kind regards, Andreas.

cc @gutzbenj


wetterdienst readings --resolution=10_minutes --period=now --parameter=air_temperature --date=2020-07-05T18:00 --persist --station=4275,3667