sberbank-ai-lab / LightAutoML

LAMA - automatic model creation framework
Apache License 2.0
887 stars 92 forks source link

Be able to put TabularNLPAutoML into an sklearn pipeline #43

Closed darenr closed 3 years ago

darenr commented 3 years ago
    steps = [
        ('automl', TabularNLPAutoML(...))
    ]
    pipeline = Pipeline(steps)

    pred = pipeline.fit_predict(
        df_train,
        roles={"target": TARGET_NAME, "text": TEXT_COLUMNS, "drop": DROP_FEATURES},
    )

produces:

TypeError: Last step of Pipeline should implement fit or be the string 'passthrough'.

I'm doing this because I have some text clean up Transformers that I'd like to pickle in one "model" object so the same clean up happens at inference time.

alexmryzhkov commented 3 years ago

Hi @darenr, we don't work with sklearn pipelines as we have the specific pipeline of data preparation inside. We also don't have the fit method - fit_predict only because it will be strange to calculate OOF predictions and not returning them back to the user.

You can fix the situation using the simple idea - make all the preparations before start of LightAutoML work and create a new column of your cleaned text (not the array of words but text) and set it as a text column.

Alex

darenr commented 3 years ago

thanks for the response, I can do that but then the piclke object at inference time won't have the input data pipleline, I wonder if I can pass in a valid transformer stage to TabularNLPAutoML?

darenr commented 3 years ago

it's text cleaning that I want to do with a text model

alexmryzhkov commented 3 years ago

Yep, I figure out what you are talking about and that's why I suggest you to make it beforehand - before the model prediction. In this case there is no need to put it inside pickle object, it can be the code as well.

darenr commented 3 years ago

by the way - amazing library Alex