open-meteo / sdk

Open-Meteo schema files
MIT License
19 stars 14 forks source link

The old order of the hourly variables is retained when submitting same request with different order within cache time #110

Closed fwitte closed 2 weeks ago

fwitte commented 2 weeks ago

Hi,

if I pass a list of measurements to the API the order of variables changes to a specific order. I.e. if I pass

['wind_speed_10m', 'direct_normal_irradiance', 'temperature_2m'] and want to process the response, e.g. in a pandas.DataFrame I get the actual data in a different order if I change the order and call the function again within the cache time.

import pandas as pd
import openmeteo_requests

import requests_cache
import pandas as pd
from retry_requests import retry

cache_session = requests_cache.CachedSession('.cache', expire_after = -1)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
openmeteo = openmeteo_requests.Client(session = retry_session)

def get_measurements(measurement_names):
    params = {
        "latitude": 53.15,
        "longitude": 8.2244,
        "start_date": "2023-12-30",
        "end_date": "2023-12-31",
        "hourly": measurement_names,
        "wind_speed_unit": "ms"
    }
    responses = openmeteo.weather_api("https://archive-api.open-meteo.com/v1/archive", params=params)
    response = responses[0]
    hourly = response.Hourly()

    hourly_data = {"date": pd.date_range(
        start = pd.to_datetime(hourly.Time(), unit = "s", utc = True),
        end = pd.to_datetime(hourly.TimeEnd(), unit = "s", utc = True),
        freq = pd.Timedelta(seconds = hourly.Interval()),
        inclusive = "left"
    )}
    for i, name in enumerate(measurement_names):
        hourly_data[name] = hourly.Variables(i).ValuesAsNumpy()

    return pd.DataFrame(hourly_data)

get_measurements(['wind_speed_10m', 'direct_normal_irradiance', 'temperature_2m'])
get_measurements(['direct_normal_irradiance', 'temperature_2m', 'wind_speed_10m'])

Is there a way to adjust the order? Thank you very much!

Best

Francesco

patrick-zippenfenig commented 2 weeks ago

It looks like the requests_cache library deliberately normalises orders of headers to improve cache ratios. Maybe this also applies to URL parameters, but I am not sure about this.

I am afraid there is no easy solution for it. You can build a workaround and a param "cache_buster": md5(measurement_names).

fwitte commented 2 weeks ago

Works perfectly, thank you very much:

from hashlib import md5
import pickle

params = {
    # ....
    "cache_buster": md5(pickle.dumps(measurement_names))
}

Maybe worth adding at the end of the variables section in the README? I can open a PR, if you'd like :)

Have a nice weekend

patrick-zippenfenig commented 2 weeks ago

It is a bit of an edge case. I would not add it to the general documentation.