open-meteo / open-data

Open-Meteo on AWS Open Data
79 stars 3 forks source link

Import historical raw GRIB files #3

Open markkvdb opened 5 months ago

markkvdb commented 5 months ago

What is the easiest way to run some kind of custom import script for a large number of historical GRIB files from MeteoFrance. These GRIB files are stored in a S3 bucket and I would like to sync them to an open meteo API server manually. Is this already supported? Else, can you point me in the direction of how I could add this functionality to the main open-meteo codebase?

I'd like to contribute to open-meteo and this might be a good first issue?

markkvdb commented 5 months ago

Do you have some kind of chat (forum) for devs to ask questions to each other? On my journey to make the open-meteo server work for some use-cases I encounter I'd like to share this experience with devs and other people setting up their servers!

patrick-zippenfenig commented 5 months ago

There is no generic ingestion for GRIB files. For every weather service, specialised downloaders are developed. Although, most NWS use GRIB, there are still so many differences that it is not possible to have a unified ingestion process.

In addition to that, all downloaders are optimised for performance and efficiency to ingest updates as fast as possible. Most downloaders use now parallel multipart downloads and concurrent processing. Certain domains like CMA GRAPES download and process 220 GB GRIB files per run every 6 hours. With 8 cores in parallel, it is feasible within 1 hour. This level of optimisation does not make it easy for developers to integrate new data sources.

Open-Meteo is not designed to be a universal framework or database for GRIB files. The initial focus is still the API endpoint, but I can understand that more and more users might want to ingest other data sources. Right now, it would be a very rough start to write a downloader.

Which kind of MeteoFrance GRIB files do you want to integrate? Is the S3 bucket publicly accessible? GRIB files from MeteoFrance labelled with SP1, HP1, etc are quite a mess to ingest. Support for them was dropped a month ago with the integration of the new MeteoFrance API

Do you have some kind of chat (forum) for devs to ask questions to each other? On my journey to make the open-meteo server work for some use-cases I encounter I'd like to share this experience with devs and other people setting up their servers!

I was considering to setup a discord channel. A drawback is that Discord chats are not well indexable by search engines. Using GitHub Tickets and Discussions is better in this regard. What is your take on that?

markkvdb commented 5 months ago

I was considering to setup a discord channel. A drawback is that Discord chats are not well indexable by search engines. Using GitHub Tickets and Discussions is better in this regard. What is your take on that?

I agree that issues and questions that are relevant for a wider audience should be shared on Github to make them publicly accessible.

I do think there's also place for more fast-paced conversations and small "coffee machine" chat that might clutter Github!

markkvdb commented 4 months ago

Small update: I will come back to you about sharing the MeteoFrance data no sooner than next week. But I haven't forgotten!

patrick-zippenfenig commented 4 months ago

No worries. I am pretty busy with https://github.com/open-meteo/open-meteo/issues/206 right now

kikocorreoso commented 1 week ago

grib2 allows range requests. See here to check how herbie does this.

Here you can read about meteofrance models in AWS S3.

An example of downloading only one field, TMP:35 m above ground:anl from a meteofrance grib2 file would be:

curl -o outFile.grib2 --range 254045-508079 https://mf-nwp-models.s3.amazonaws.com/arpege-europe/v1/2024-06-27/06/HP1/00H12H.grib2

The outFile.grib2 file can be opened using panoply without issues.

I hope it helps.

patrick-zippenfenig commented 1 week ago

Hi, the MeteoFrance on AWS distribution does not offer the highest resolution, all time steps and weather variables. MeteoFrance now has an open data distribution for ARPEGE 0.25, ARPEGE 0.1, AROME 0.025 and AROME 0.01. They offer a similar S3 interface. There are still some missing files, but I am trying to report all issues to MeteoFrance and hope it gets fixed in the next weeks.

The AROME PI models with updates every hour and 15 minutely data are only available through the MeteoFrance API.

All those distributions are already implemented in Open-Meteo and use HTTP RANGE calls if possible (most open-data servers do not offer an index file).