panodata / dwdweather2

Python client to access weather data from Deutscher Wetterdienst (DWD), the federal meteorological service in Germany.
https://community.panodata.org/t/dwdweather2-a-python-client-to-access-weather-data-from-deutscher-wetterdienst-dwd/98
MIT License
72 stars 13 forks source link

Implementation of solar retrieval seems to be incorrect #17

Closed Nikolai10 closed 5 years ago

Nikolai10 commented 5 years ago

Hello @amotl,

first of all thank you for your great work. After exploring dwdweather2, I noticed that solar_* returns always None.

However, e.g. for station ID 5856 (hourly resolution) there clearly is data available (see https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/solar/)

e.g:

query_hour = datetime(2019, 7, 17, 11)
result = dw.query(station_id=5856, timestamp=query_hour)
for item in result.items():
     print(item)

should return solar-values unequal None:
5856;2019071711:12; 1; -999; 57.0; 321.0; 60; 28.00;2019071712:00;eor

Am I missing something here? Kind regards

amotl commented 5 years ago

Dear @Nikolai10,

thanks for writing in. We are able to confirm your observations.

Invoking

dwdweather weather 5856 20190717T11 --resolution hourly

yields these log entries regarding acquisition of solar data

2019-11-04 15:23:09,194 [dwdweather.core     ] INFO   : Downloading "solar" data (ST)
2019-11-04 15:23:09,194 [dwdweather.client   ] INFO   : Requesting https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/solar
2019-11-04 15:23:09,223 [dwdweather.client   ] INFO   : Fetching resource https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/solar/stundenwerte_ST_05856_row.zip
2019-11-04 15:23:09,582 [dwdweather.client   ] INFO   : Reading from Zip: produkt_st_stunde_19970102_20190930_05856.txt
2019-11-04 15:23:09,625 [dwdweather.core     ] INFO   : Importing measurements for station "5856" and category "{'key': 'ST', 'name': 'solar'}"
2019-11-04 15:23:09,625 [dwdweather.core     ] INFO   : Importing "solar" data from "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/solar/stundenwerte_ST_05856_row.zip/produkt_st_stunde_19970102_20190930_05856.txt"
 80%|█████████████████████████▌      | 159191/199323 [00:28<00:07, 5567.59it/s]

and finally delivers NULL values for solar as well:

    "solar_atmosphere": null,
    "solar_duration": null,
    "solar_end_of_interval": null,
    "solar_global": null,
    "solar_quality_level": null,
    "solar_sky": null,
    "solar_zenith": null,

We will have to look into that.

With kind regards, Andreas.

amotl commented 5 years ago

Investigation

After a short investigation, we see that a typical data line of air_temperature looks like:

5856;2018072003;    3;  16.2;  69.0

Respectively, that's a line for wind:

5856;2018081003;   10;   3.0; 240

However, a line for solar looks like:

5856;1997021322:20;    1;   -999;    0.0;    0.0;   0;   139.96;1997021323:00

or

5856;1997061306:06;    1;   -999;   54.0;   63.0;  12;    67.99;1997061307:00

So, we recognize suffixes on the timestamp like :20or :06 here probably designating the minute where this measurement was taken.

Conclusion

This most probably confuses the import routine import_measures_textfile [1] which is coming from the "old" dwdweather codebase and is reasonably convoluted that I didn't dare to touch it for a refactoring yet.

Proposal

I will try to mitigate that by stripping away these suffixes altogether, if you or @wetterfrosch do not veto against it. Altogether, we are working on the "hourly" level here, right? So, I believe this should do no harm.

With kind regards, Andreas.

[1] https://github.com/panodata/dwdweather2/blob/0.10.0/dwdweather/core.py#L402-L464

amotl commented 5 years ago

Dear @Nikolai10,

we just released dwdweather2 0.11.1 which might mitigate that problem.

Invoking

dwdweather weather 5856 20190717T11 --resolution hourly --categories solar

now yields data for the "solar" category within the given time.

    "solar_atmosphere": 60.0,
    "solar_duration": null,
    "solar_end_of_interval": 201907171200,
    "solar_global": 321.0,
    "solar_quality_level": 1,
    "solar_sky": 57.0,
    "solar_zenith": 28.0,

Please add the --reset-cache parameter in order to drop the SQL database beforehand and be aware that this will drop all data within that completely.

With kind regards, Andreas.

Nikolai10 commented 5 years ago

@amotl: This has indeed resolved the issue. Thanks for your help