feat(Label_Ecoder):Support of Label Encoder in Multi Target Task

tahalpara commented 1 year ago

Description: Tried to implement Label Encoder for categorical features in a Multi-target Classification/Regression Scenario. (This is an ongoing issue)

Changes Made: There are 3 files that were edited

model.py.jinja and model_train.py.jinja
evaluation.py.jinja

In the first case, I have added the logic where whenever a catgorical object is found in the features, we implement the Label Encoder thus ensuring all the categorical features are encoded. This solution proposed works for the multi-target scenario. But fails in the scenario where XGB Classifier is used.

I have created a test experiment script as below

from sapientml import SapientML
import pandas as pd
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split

cls = SapientML(
        target_columns=["sex","survived","pclass"],#,"pclass","sibsp"],
                task_type=None,  # suggested automatically from the target columns
               # adaptation_metric='auc'
                )

train_data = pd.read_csv("https://github.com/sapientml/sapientml/files/12481088/titanic.csv")
train_data, test_data = train_test_split(train_data)
y_true = test_data[["sex","survived","pclass"]].reset_index(drop=True)
test_data.drop(["sex","survived","pclass"], axis=1, inplace=True)
cls.fit(train_data, output_dir="./outputs")
y_pred = cls.predict(test_data)

Upon running this I get the below error ValueError: Invalid classes inferred from unique values ofy. Expected: [0 1 2], got [1 2 3]

The above error is specific to XGBClassifier. Upon further investigation I found that Label Encoder does not works well with the latest version of XGBClassifier hence the issue. In this case the solution would be to use other encoding types like One Hot Encoder in the scenario where we have our selected model as XGBClassifier

Below is the reference link of the version issue: https://stackoverflow.com/questions/71996617/invalid-classes-inferred-from-unique-values-of-y-expected-0-1-2-3-4-5-got

In the second case, when the evaluation metric is not mentioned, sapientML considers the F1 score by default as the metric. To support multi target, there was a need to change the F1 score metric evaluation. I used a for loop to go throw individual target columns and calculate it's F1 score. In this way I was able to solve the evaluation metric error.

Further discussion and course of action needs to be evaluated

AkiraUra commented 12 months ago

@arima-tsukasa This PR relates to your work. Please watch this PR as long as possible.

codecov[bot] commented 9 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

:exclamation: No coverage uploaded for pull request base (main@cf5bcba). Click here to learn what that means.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #29 +/- ## ======================================= Coverage ? 63.78% ======================================= Files ? 36 Lines ? 2850 Branches ? 0 ======================================= Hits ? 1818 Misses ? 1032 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sapientml / core

feat(Label_Ecoder):Support of Label Encoder in Multi Target Task #29

Codecov Report