Open highway900 opened 7 months ago
Whats required for this is being able to call coordinates_to_cells
directly on a polars expression Expr
. We are already providing polars expressions in https://github.com/nmandery/h3ronpy/blob/ca891fa5dfa8e1ea4dd7006d15d31bd294a45a2a/python/h3ronpy/polars/__init__.py#L57C7-L57C7 , but not for this functionality. The problem here is that this function requires at minimum two series as input and I do not see how this can be achieved using https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.api.register_expr_namespace.html#polars.api.register_expr_namespace . Polars expressions seem to operate only one single series. Please correct me if that is not the case - I am not up-to-date with the most recent versions of polars.
What could be done is implementing an extension of a LazyFrame
(https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.api.register_lazyframe_namespace.html), but I am not sure about how useful this would be. It would only allow calling the function directly on lazyframes, not from within expressions.
Thanks for looking at this, I was mostly looking at using a Lazyframe
and not explicitly using expressions. I will have poke around with register_lazyframe_namespace
. I think though you answered my query which was this currently isn't possible so it's not just my lack of experience with polars being the problem :)
Hi ^^. In order to take multiple args it could be implemented using the polars plugin system like in https://marcogorelli.github.io/polars-plugins-tutorial/lost_in_space/. Something in the line of
#[polars_expr(output_type = UInt64)]
fn coordinates_to_cells(inputs: &[Series], kwargs: H3Kwargs) -> PolarsResult<Series> {
let lats = inputs[0].f64()?;
let lons = inputs[1].f64()?;
let resolution = Resolution::try_from(kwargs.resolution).unwrap();
let mut cells: Vec<u64> = Vec::with_capacity(lats.len());
lats.iter().zip(lons.iter()).for_each(|(lat, lon)| {
if let (Some(lat), Some(lon)) = (lat, lon) {
cells.push(u64::from(LatLng::new(lat, lon).unwrap().to_cell(resolution)))
}
});
Ok(UInt64Chunked::from_vec("cells", cells).into_series())
}
and then register the function in pythonland with:
import polars as pl
from polars.plugins import register_plugin_function
from polars.type_aliases import IntoExpr
def coordinates_to_cells(lat: IntoExpr, lon: IntoExpr,*, resolution: int) -> pl.Expr:
return register_plugin_function(
plugin_path=Path(__file__).parent,
args=[lat, lon],
kwargs={"resolution": resolution},
function_name="coordinates_to_cells",
is_elementwise=True,
)
Would allow us to operate on the LazyFrame example as such
In [7]: df.collect()
Out[7]:
shape: (3, 2)
┌───────────┬─────────┐
│ x ┆ y │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═══════════╪═════════╡
│ -74.006 ┆ 40.7128 │
│ -118.2437 ┆ 34.0522 │
│ -87.6298 ┆ 41.8781 │
└───────────┴─────────┘
In [8]: df.select(cells=coordinates_to_cells("x", "y", resolution=8)).collect()
Out[8]:
shape: (3, 1)
┌────────────────────┐
│ cells │
│ --- │
│ u64 │
╞════════════════════╡
│ 616717907826573311 │
│ 616483633261182975 │
│ 616736054719807487 │
└────────────────────┘
However, this needs a custom plugin in rust land which needs to be build as a polars plugin :/
I guess it can be done using the existing h3ronpy function coordinates_to_cells
and doing some black magic with pl.Expr to exctract the series from the multicolumn expression like
df.select(cell = pl.col("x", "y").h3.coordinates_to_cells(resolution=8))
But the documentation falls short and I did not find anything similar in the wilderness
I am a new polars user and I am curious how do I use the
coordinates_to_cells
function in a lazy context?If I do what I think needs to be done I get an error
TypeError: 'Expr' object is not iterable
I can achieve my goal in the eager way. But hoping I can do this with the lazy api?