Task never completes (multiclass)

This script below ran for many hours (MacbookPro (current Intel model), no GPU, 16GB RAM) before I killed it, some runs give me an error but it keeps going: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

I set a timeout of an hour which seems to be ignored, I've tried playing with different algorithms but can't this to make a model, every time I give up after running it all night.

import numpy as np
import pandas as pd
from lightautoml.automl.presets.text_presets import TabularAutoML, TabularNLPAutoML
from lightautoml.tasks import Task
from sklearn import preprocessing

from sklearn.model_selection import train_test_split

df = pd.read_json("https://github.com/nomadotto/News_Classifier/blob/master/News_Category_Dataset_v2.json?raw=true", lines=True)

print(df.head())

automl = TabularNLPAutoML(
    task=Task("multiclass"),
    timeout=3600,
    verbose=2,
    general_params={"use_algos": ["lgb", "cb"]},
    reader_params={"cv": 5, "random_state": 42},
    text_params={"lang": "en"},
    gbm_pipeline_params={"text_features": "tfidf"},
    tfidf_params={
        "svd": True,
        "tfidf_params": {
            "ngram_range": (1, 2),
            "sublinear_tf": True,
            "max_features": 1500,
        },
    },
)

print("splitting...")
df_train, df_test = train_test_split(
    df,
    test_size=0.2,
    shuffle=True,
    random_state=42,
)

print("fitting...")
oof_pred = automl.fit_predict(
    df_train,
    roles={
        "target": "category",
        "text": ["headline", "short_description"],
        "drop": ["authors", "link", "date"]
    }
)

print(oof_pred)

sberbank-ai-lab / LightAutoML

Task never completes (multiclass) #50