meteostat / meteostat-python

Access and analyze historical weather and climate data with Python.
https://dev.meteostat.net/python/
MIT License
439 stars 60 forks source link

`Hourly` attempts to fetch data of next year #132

Closed bram-tv closed 1 year ago

bram-tv commented 1 year ago

When using Hourly and fetching data of today it warns that it can not load data of the year 2024.

Example

(For a random weather station)

from datetime import datetime
from meteostat import Hourly

start = datetime(2023, 9, 20, 0, 0, 0)
end = datetime(2023, 9, 20, 1, 0, 0)

Hourly('01001', start=start, end=end).fetch()

Running the code:

$ rm -rf ~/.meteostat/cache/hourly/
$ python3 fetch.py
Warning: Cannot load hourly/2024/01001.csv.gz from https://bulk.meteostat.net/v2/

It's attempting to load data for 2024 which obviously isn't available yet..

Root cause

Relevant code: https://github.com/meteostat/meteostat-python/blob/051cd235eff2fd9f2c85e3a887e2e27b32b2144d/meteostat/interface/hourly.py#L131

It's using range(end.year - start.year + 2) which is what is causing the issue..

Looking on why the + 2 was added: it was changed from + 1 to + 2 in commit ceb9277faf6aab39a584cc99d37a7ca9cb661a50 for issue #106.

Looking at the commit message/issue doesn't immediately reveal why but digging a bit deeper: the +2 is needed when a leap year is involved.

Reproducing it with dates from #106 and the original code:

>>> from datetime import datetime
>>> start = datetime(2018, 1, 1)
>>> end = datetime(2021, 6, 6)
>>> [(start + timedelta(days=365 * i)).year for i in range(end.year - start.year + 1)]
[2018, 2019, 2020, 2020]

Contains a duplicate 2020 year and the last item is set to 2020 where it should be 2021;

With the changed code:

>>> from datetime import datetime
>>> start = datetime(2018, 1, 1)
>>> end = datetime(2021, 6, 6)
>>> [(start + timedelta(days=365 * i)).year for i in range(end.year - start.year + 1)]
[2018, 2019, 2020, 2020, 2021]

The last item is 2021 but the year 2020 is still duplicated..

Running the changed code for today:

>>> from datetime import datetime
>>> start = datetime(2023, 9, 20)
>>> end = datetime(2023, 9, 20)
>>> [(start + timedelta(days=365 * i)).year for i in range(end.year - start.year + 2)]
[2023, 2024]

The last item is 2024 which isn't what was asked for..

TLDR

Possible fix

Use start.year + i (and + 1 instead of + 2), i.e:

>>> from datetime import datetime
>>> start = datetime(2018, 1, 1)
>>> end = datetime(2021, 6, 6)
>>> [start.year + i for i in range(end.year - start.year + 1)]
[2018, 2019, 2020, 2021]
>>> start = datetime(2023, 9, 20)
>>> end = datetime(2023, 9, 20)
>>> [start.year + i for i in range(end.year - start.year + 1)]
[2023]

An alternative fix could be to do something like: start.replace(year=start.year+1).year but that will fail if start is 29 February

clampr commented 1 year ago

Thank you! Your fix was shipped in version 1.6.6.