Closed FBruzzesi closed 6 months ago
π€ are you sure about adding pyarrow as required? I think we should be able to solve this one
but use pandas groupby's to iterate over different batch of data.
Narwhals' DataFrame.GroupBy
also lets you iterate over groups - I'll take a closer look at what's required, but this seems feasible
@MarcoGorelli oh I see what you mean! Regarding hierarchical I am 100% sure it is doable, I can do it tomorrow myself (Edit: ok maybe let's say 99 as I need some missing features)
For Grouped, internals are a bit messy, but I will give it a try.
I am really pushing the boundaries for GroupedPredictor
, I think within another swipe I can finish this.
Current status is:
GroupedTransformer
passes all the tests, except one in which the group index is provided as negative (honestly, who would do that π)HierarchicalPredictor
is failing only one test as well (tests/test_meta/test_hierarchical_predictor.py::test_expected_output
), but only for the polars case, which could be due to the fact that the estimator cannot deal with polars?GroupedPredictor
needs some more patches to close the deal.Fixed all the issues I had in the conversion and marking this as ready to review. I am quite happy with the results ππΌππΌ
The only breaking change from previous version/implementation is the unfeasibility of providing negative indices in groups
.
cc: @koaning @MarcoGorelli
Wow, amazing!
The only breaking change from previous version/implementation is the unfeasibility of providing negative indices in groups.
This is only when starting from non-dataframe input (e.g. numpy) right? If so, then would a patch like
diff --git a/sklego/meta/_grouped_utils.py b/sklego/meta/_grouped_utils.py
index 97180ab..e744f5a 100644
--- a/sklego/meta/_grouped_utils.py
+++ b/sklego/meta/_grouped_utils.py
@@ -20,7 +20,6 @@ def parse_X_y(X, y, groups, check_X=True, **kwargs) -> nw.DataFrame:
_data_format_checks(X)
# Convert X to Narwhals frame
- X = nw.from_native(X, strict=False, eager_only=True)
if not isinstance(X, nw.DataFrame):
X = nw.from_native(pd.DataFrame(X))
diff --git a/sklego/meta/grouped_transformer.py b/sklego/meta/grouped_transformer.py
index 9f6b0d9..9c2eac3 100644
--- a/sklego/meta/grouped_transformer.py
+++ b/sklego/meta/grouped_transformer.py
@@ -109,6 +109,9 @@ class GroupedTransformer(BaseEstimator, TransformerMixin):
self.fallback_ = None
self.groups_ = as_list(self.groups) if self.groups is not None else None
+ X = nw.from_native(X, strict=False, eager_only=True)
+ if not isinstance(X, nw.DataFrame) and self.groups_ is not None:
+ self.groups_ = [X.shape[1]+group if isinstance(group, int) and group<0 else group for group in self.groups_]
frame = parse_X_y(X, y, self.groups_, check_X=self.check_X, **self._check_kwargs)
if self.groups is None:
@@ -181,6 +184,7 @@ class GroupedTransformer(BaseEstimator, TransformerMixin):
"""
check_is_fitted(self, ["fallback_", "transformers_"])
+ X = nw.from_native(X, strict=False, eager_only=True)
frame = parse_X_y(X, y=None, groups=self.groups_, check_X=self.check_X, **self._check_kwargs).drop(
"__sklego_target__"
)
diff --git a/tests/test_meta/test_grouped_transformer.py b/tests/test_meta/test_grouped_transformer.py
index b9aae2c..18b28d5 100644
--- a/tests/test_meta/test_grouped_transformer.py
+++ b/tests/test_meta/test_grouped_transformer.py
@@ -291,7 +291,7 @@ def test_array_with_multiple_string_cols(penguins):
X = penguins
# BROKEN: Failing due to negative indexing... kind of an edge case
- meta = GroupedTransformer(StandardScaler(), groups=[0, X.shape[1] - 1])
+ meta = GroupedTransformer(StandardScaler(), groups=[0, -1])
transformed = meta.fit_transform(X)
allow you to preserve current behaviour?
That's a nice trick! Thanks Marco! Pushing it now
This should be ready for review. Once we merge into narwhals-development
then I would say that we covered large majority of #658 and it would be time for a (quite large) release. @koaning WDYT?
Description
Addresses Grouped* and Hierarchical, following up to the comment I added in the thread. I will paste it for easier visibility: