vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.22k stars 589 forks source link

statistics on 2d grids: control the `bin_centers` #2418

Open vadmbertr opened 3 months ago

vadmbertr commented 3 months ago

Hi,

I have a question regarding a usecase of mine (maybe it is a common one, I don't know).

Let's say df (a pandas.DataFrame) holds scalar values for a variable observed at different time, latitude and longitude. I can compute the variable mean over time, binned by longitude and latitude on a NxM grid as:

df_vx = vaex.from_pandas(df)
gridded_mean = df_vx.mean("variable", binby=["longitude", "latitude"], shape=(N, M))

The shape allows me to control the underlying grid resolution, knowing the spatial extent of the observations. However, I would like to directlly control the resulting bin_centers of the underlying grid, in the case of a regular grid, such that they match another grid. So far, I am achieving this by adding fake rows to df with the appropriate latitude and longitude coordinates. Is there a better / more direct way to do it?

Thanks. Vadim