Open jacobbieker opened 8 months ago
Hi @jacobbieker, can I take this one?
Yep! That would be great
Hey @jacobbieker, I've added code to fetch hourly data from different NWPs via the OpenMeteo API. Any quick thoughts on how we can compare these NWPs effectively? Also, could you check out the code?
import openmeteo_requests # Importing required libraries
import requests_cache
import pandas as pd
from retry_requests import retry
class WeatherDataFetcher:
BASE_URL = "https://api.open-meteo.com/v1/" # Base URL for OpenMeteo API
def __init__(self):
# Initialize the WeatherDataFetcher class
# Setup the Open-Meteo API client with cache and retry on error
cache_session = requests_cache.CachedSession('.cache', expire_after=3600)
retry_session = retry(cache_session, retries=5, backoff_factor=0.2)
self.openmeteo = openmeteo_requests.Client(session=retry_session)
def fetch_weather_data(self, NWP, params):
# Fetch weather data from OpenMeteo API for the specified model (NWP) and parameters
url = f"https://api.open-meteo.com/v1/{NWP}" # Construct API URL
try:
responses = self.openmeteo.weather_api(url, params=params) # Get weather data
return responses[0] # Return the first response (assuming only one location)
except openmeteo_requests.OpenMeteoRequestsError as e:
# Handle OpenMeteoRequestsError exceptions
if 'No data is available for this location' in str(e):
print(f"Error: No data available for the location for model '{NWP}'.")
else:
print(f"Error: {e}")
return None
def process_hourly_data(self, response):
# Process hourly data from OpenMeteo API response
# Extract hourly data from the response
hourly = response.Hourly()
# Extract variables
hourly_variables = {
"temperature_2m": hourly.Variables(0).ValuesAsNumpy(),
"relative_humidity_2m": hourly.Variables(1).ValuesAsNumpy(),
"precipitation": hourly.Variables(2).ValuesAsNumpy(),
"cloud_cover": hourly.Variables(3).ValuesAsNumpy()
}
# Extract time information
time_range = pd.date_range(
start=pd.to_datetime(hourly.Time(), unit="s", utc=True),
end=pd.to_datetime(hourly.TimeEnd(), unit="s", utc=True),
freq=pd.Timedelta(seconds=hourly.Interval()),
inclusive="left"
)
# Create a dictionary for hourly data
hourly_data = {"date": time_range}
# Assign each variable to the corresponding key in the dictionary
for variable_name, variable_values in hourly_variables.items():
hourly_data[variable_name] = variable_values
# Create a DataFrame from the dictionary
hourly_dataframe = pd.DataFrame(data=hourly_data)
return hourly_dataframe
def print_location_info(self, response):
# Print location information from OpenMeteo API response
print(f"Coordinates {response.Latitude()}°N {response.Longitude()}°E")
print(f"Elevation {response.Elevation()} m asl")
print(f"Timezone {response.Timezone()} {response.TimezoneAbbreviation()}")
print(f"Timezone difference to GMT+0 {response.UtcOffsetSeconds()} s")
def main():
# Main function to demonstrate usage of WeatherDataFetcher class
fetcher = WeatherDataFetcher() # Create instance of WeatherDataFetcher
# Specify parameters for weather data fetch
NWP = "gfs" # Choose NWP model
params = {
"latitude": 40.77, # Latitude of the location
"longitude": -73.91, # Longitude of the location
"hourly": ["temperature_2m", "relative_humidity_2m", "precipitation", "cloud_cover"], # Variables to fetch
"start_date": "2023-12-21", # Start date for data
"end_date": "2024-03-15" # End date for data
}
# Fetch weather data for specified model and parameters
response = fetcher.fetch_weather_data(NWP, params)
# Print location information
fetcher.print_location_info(response)
# Process and print hourly data
gfs_dataframe = fetcher.process_hourly_data(response)
print(gfs_dataframe)
if __name__ == "__main__":
main() # Call main function if script is executed directly
Hey @jacobbieker, I've added code to fetch hourly data from different NWPs via the OpenMeteo API. Any quick thoughts on how we can compare these NWPs effectively? Also, could you check out the code?
import openmeteo_requests # Importing required libraries import requests_cache import pandas as pd from retry_requests import retry class WeatherDataFetcher: BASE_URL = "https://api.open-meteo.com/v1/" # Base URL for OpenMeteo API def __init__(self): # Initialize the WeatherDataFetcher class # Setup the Open-Meteo API client with cache and retry on error cache_session = requests_cache.CachedSession('.cache', expire_after=3600) retry_session = retry(cache_session, retries=5, backoff_factor=0.2) self.openmeteo = openmeteo_requests.Client(session=retry_session) def fetch_weather_data(self, NWP, params): # Fetch weather data from OpenMeteo API for the specified model (NWP) and parameters url = f"https://api.open-meteo.com/v1/{NWP}" # Construct API URL try: responses = self.openmeteo.weather_api(url, params=params) # Get weather data return responses[0] # Return the first response (assuming only one location) except openmeteo_requests.OpenMeteoRequestsError as e: # Handle OpenMeteoRequestsError exceptions if 'No data is available for this location' in str(e): print(f"Error: No data available for the location for model '{NWP}'.") else: print(f"Error: {e}") return None def process_hourly_data(self, response): # Process hourly data from OpenMeteo API response # Extract hourly data from the response hourly = response.Hourly() # Extract variables hourly_variables = { "temperature_2m": hourly.Variables(0).ValuesAsNumpy(), "relative_humidity_2m": hourly.Variables(1).ValuesAsNumpy(), "precipitation": hourly.Variables(2).ValuesAsNumpy(), "cloud_cover": hourly.Variables(3).ValuesAsNumpy() } # Extract time information time_range = pd.date_range( start=pd.to_datetime(hourly.Time(), unit="s", utc=True), end=pd.to_datetime(hourly.TimeEnd(), unit="s", utc=True), freq=pd.Timedelta(seconds=hourly.Interval()), inclusive="left" ) # Create a dictionary for hourly data hourly_data = {"date": time_range} # Assign each variable to the corresponding key in the dictionary for variable_name, variable_values in hourly_variables.items(): hourly_data[variable_name] = variable_values # Create a DataFrame from the dictionary hourly_dataframe = pd.DataFrame(data=hourly_data) return hourly_dataframe def print_location_info(self, response): # Print location information from OpenMeteo API response print(f"Coordinates {response.Latitude()}°N {response.Longitude()}°E") print(f"Elevation {response.Elevation()} m asl") print(f"Timezone {response.Timezone()} {response.TimezoneAbbreviation()}") print(f"Timezone difference to GMT+0 {response.UtcOffsetSeconds()} s") def main(): # Main function to demonstrate usage of WeatherDataFetcher class fetcher = WeatherDataFetcher() # Create instance of WeatherDataFetcher # Specify parameters for weather data fetch NWP = "gfs" # Choose NWP model params = { "latitude": 40.77, # Latitude of the location "longitude": -73.91, # Longitude of the location "hourly": ["temperature_2m", "relative_humidity_2m", "precipitation", "cloud_cover"], # Variables to fetch "start_date": "2023-12-21", # Start date for data "end_date": "2024-03-15" # End date for data } # Fetch weather data for specified model and parameters response = fetcher.fetch_weather_data(NWP, params) # Print location information fetcher.print_location_info(response) # Process and print hourly data gfs_dataframe = fetcher.process_hourly_data(response) print(gfs_dataframe) if __name__ == "__main__": main() # Call main function if script is executed directly
Hi,
Overall looks good although a bit hard to review here, could you open a pull request instead to add the code? I can give better feedback that way.
OpenMeteo has started a public dataset (see https://github.com/open-meteo/open-data) that is archiving multiple different providers of weather data forecasts. The archive only goes back to December 2023, so currently probably isn't super helpful for training, but would be interesting to have support as the archive gets larger, and we might want to finetune models or try different initializations for ensembles, like in #85.
Context
Being able to use multiple different NWPs as comparisons/finetuning/initializations could help a lot in seeing how these models compare to other, more physics-based simulations. They also include models that have much higher resolutions than ERA5, or for limited areas, even HRES/ENS. This then relates to being able to use models with adaptive meshes or with a nested high resolution area inside the global model (#78, #3).
Possible Implementation
Similar to #86 except the data is not in Zarr, but a format more suited to site-level forecasts. Would want to get global or local grids of data from it, so would require some reshaping of the data.