meteostat / meteostat-python

Access and analyze historical weather and climate data with Python.
https://dev.meteostat.net/python/
MIT License
437 stars 60 forks source link

NaN discrepancy between older versions (<=1.5.11) and newer versions (>=1.6.0) #145

Closed dcervenkov closed 1 year ago

dcervenkov commented 1 year ago

I get different datasets from the same request in older and newer versions of meteostat. I narrowed it down to the jump from 1.5.11 to 1.6.0.

To make the comparison as apples-to-apples as possible, I'm using pandas==2.0.3 which works with both meteostat==1.5.11 and meteostat==1.6.0.

Steps to reproduce

meteostat 1.5.11

python3 -m venv venv_meteostat15
./venv_meteostat15/bin/pip install pandas==2.0.3 meteostat==1.5.11
./venv_meteostat15/bin/python -c "from meteostat import Hourly; import datetime; print(Hourly(loc='11520', start=datetime.datetime(2021, 1, 1, 0, 0, 0), end=datetime.datetime(2022, 11, 3, 23, 0, 0), timezone='Europe/Prague').normalize().fetch().info())"

Result

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 16128 entries, 2021-01-01 00:00:00+01:00 to 2022-11-03 23:00:00+01:00
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   temp    16128 non-null  float64
 1   dwpt    16128 non-null  float64
 2   rhum    16128 non-null  float64
 3   prcp    12628 non-null  float64
 4   snow    0 non-null      float64
 5   wdir    16128 non-null  float64
 6   wspd    16128 non-null  float64
 7   wpgt    15865 non-null  float64
 8   pres    16128 non-null  float64
 9   tsun    0 non-null      float64
 10  coco    15888 non-null  float64
dtypes: float64(11)
memory usage: 1.5 MB
None

meteostat 1.6.0

python3 -m venv venv_meteostat16
./venv_meteostat16/bin/pip install pandas==2.0.3 meteostat==1.6.0
./venv_meteostat16/bin/python -c "from meteostat import Hourly; import datetime; print(Hourly(loc='11520', start=datetime.datetime(2021, 1, 1, 0, 0, 0), end=datetime.datetime(2022, 11, 3, 23, 0, 0), timezone='Europe/Prague').normalize().fetch().info())"

Result

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 16128 entries, 2021-01-01 00:00:00+01:00 to 2022-11-03 23:00:00+01:00
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   temp    16128 non-null  float64
 1   dwpt    16128 non-null  float64
 2   rhum    16128 non-null  float64
 3   prcp    3867 non-null   float64
 4   snow    0 non-null      float64
 5   wdir    16128 non-null  float64
 6   wspd    16128 non-null  float64
 7   wpgt    15865 non-null  float64
 8   pres    16128 non-null  float64
 9   tsun    0 non-null      float64
 10  coco    15888 non-null  float64
dtypes: float64(11)
memory usage: 1.5 MB
None

Notice the prcp column has 12628 non-null values in 1.5.11 but only 3867 non-null values in 1.6.0!

clampr commented 1 year ago

Thank you for reaching out @dcervenkov,

I looked into this and I can confirm that you should use >= 1.6.0 for the correct data. Version 1.5.7 uses an outdated endpoint which is still available so we don't break existing installations. However, the data you're receiving from this endpoint was removed due to a bug in DWD MOSMIX data.

dcervenkov commented 1 year ago

Thanks for looking into this and explaining the situation, @clampr!