sberbank-ai-lab / LightAutoML

LAMA - automatic model creation framework
Apache License 2.0
894 stars 92 forks source link

[Question] How to determine to which class each probability column belongs? #57

Closed PGijsbers closed 3 years ago

PGijsbers commented 3 years ago

Is it possible to determine to which class each probability columns belongs?

Consider the following example:

import numpy as np

x = np.random.random((150, 4))
y = np.asarray(list("abc") * 50).reshape(-1, 1)

import pandas as pd

data = pd.DataFrame(np.hstack([x, y]), columns=["f1", "f2", "f3", "f4", "target"])
print(data.head())

from lightautoml.tasks import Task
from lightautoml.automl.presets.tabular_presets import TabularUtilizedAutoML

task = Task("multiclass")
automl = TabularUtilizedAutoML(task=task, timeout=30)

automl.fit_predict(data, roles=dict(target="target"))

preds = automl.predict(data[["f1", "f2", "f3", "f4"]])

The resulting preds is a NumpyDataset with data shape (150,3) representing features WeightedBlend_{0,1,2}. Nowhere in the meta-data can I find whether the first column probabilities correspond to class 'a' (or any other class). Am I missing something here?

As far as I can tell the column order depends on the class order in the original training data. But I can't find this explicitly anywhere, nor can I find a progamatic way of retrieving the order of labels as used by lightautoml. I would expect e.g. a classes_ property or the feature names to reflect the classes of which the probability is predicted in each column.

alexmryzhkov commented 3 years ago

Hi @PGijsbers,

That’s normal behaviour for us - take a look at my answer here.

Alex

PGijsbers commented 3 years ago

Thank you, sorry for the duplicate issue.