koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.28k stars 118 forks source link

[FEATURE] Parallelized Grouped Predictor #711

Open AhmedThahir opened 3 weeks ago

AhmedThahir commented 3 weeks ago

Context The GroupedPredictor, GroupedRegressor, GroupedClassifier are all single-threaded.

Problem If we have many groups with lots of rows in each, the training time will be high.

Request GroupedPredictor, GroupedRegressor, GroupedClassifier could be parallelized using Joblib by accepting n_jobs as a kwarg.

FBruzzesi commented 2 weeks ago

Hey @AhmedThahir , that's definitly a reasonable request. We have such option for OrdinalClassifier. Concern is only about the fact that majority of estimators also allow to fit in parallel, yet I don't see anything wrong in letting the user decide at which level the parallelism should happen

AhmedThahir commented 2 weeks ago

If the user specifies n_jobs=-1 for the estimators and the meta estimator, Joblib will automatically handle the parallelism: ie, it will use as many threads available.

If the user specifies anything other than -1, 1 or None, then it is implied that they are cognisant of the parallelism and any associated issues that may arise.

Hence, there will be no downsides to such a feature.

FBruzzesi commented 2 weeks ago

Hey @AhmedThahir , it turns out that I completely forgot that we already support such feature in HierarchicalClassifier and HierarchicalRegressor, which in general have a better design compared to Grouped estimators. We could not change grouped ones because we did not want to break the API too badly.

If you can, then moving to one of those should address the problem for you. If you cannot, then I will consider adding parallelization to the grouped estimators

AhmedThahir commented 2 weeks ago

Hi @FBruzzesi

Just checked out the HierarchicalClassifier and HierarchicalRegressor.

{

    (1,): LogisticRegression(),  # global estimator

    (1, 'A'): LogisticRegression(),  # estimator for `g_1 = 'A'`

    (1, 'B'): LogisticRegression(),  # estimator for `g_1 = 'B'`

    (1, 'A', 'X'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('A', 'X`)`

    (1, 'A', 'Y'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('A', 'Y`)`

    (1, 'B', 'W'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('B', 'W`)`

    (1, 'B', 'Z'): LogisticRegression(),  # estimator for `(g_1, g_2) = ('B', 'Z`)`

}

However, this does not seem to be what I require. I just want the fit for the lowest level groups, without fitting for each level.

If you do not want to make any changes to the API currently, could you share the code so that I may use it for my projects as a custom class.

In the long run, I still believe that parallelization to the grouped estimator as part of the main API will be useful.

FBruzzesi commented 2 weeks ago

You can fit the lowest level only by specifying fallback_method="raise", shrinkage=None. From the docs:

[...] This means that an estimator is fitted for each level of the group columns.

The only exception to that is when shrinkage=None and fallback_method="raise", in which case only one estimator per group value is fitted.

Please let me know if that's what you are looking for.

If you would still require to have a global fallback at prediction time, then it could be worth adding that to the Hierarchical API.

In the meanwhile I will assess how feasible it is to parallelize grouped

AhmedThahir commented 2 weeks ago

Thank you! Got it :)

  1. Is there a way to do this? "global fallback at prediction time"
  2. Doesn't this make grouped predictor redundant?
FBruzzesi commented 2 weeks ago
  1. I don't think it is possible to do so in Hierarchical at the moment, but it could be in scope for the shrinkage=None case.

  2. Quoting a comment directly:

  • GroupedPredictor refactoring was a first attempt to fix the issue reported here, as well as expanding the functionalities. Since this was getting a bit messy we decided to follow another approach, which led us to the next two points.
  • GroupedPredictor patch follows your suggestion to raise an error if shrinkage is used in combination with a classification task. This is a somewhat breaking change which is already in the main branch.
  • In HierarchicalPredictor and HierarchicalTransformer we set up the discussion for how to deal with that as well as adding other features to meta estimators that are somewhat similar to Grouped* ones.