google / weather-tools

Tools to make weather data accessible and useful.
https://weather-tools.readthedocs.io/
Apache License 2.0
203 stars 38 forks source link

Faster ingestion into BQ by converting the chunk into pd.Dataframe #414

Open DarshanSP19 opened 8 months ago

DarshanSP19 commented 8 months ago

In weather-mv we're dividing the Dataset into small chunks that's adding appropriate parallelism in the pipeline. In the next step if we convert those small chunks into pandas Dataframes it would reduce the cost of generating the flat rows as extracting the rows from Dataframe is very fast.

df = ds.to_dataframe().reset_index()

Here ds is a small chunk of a dataset and reset_index() will flatten the dataset chunk into a normalized dataframe.