Open pplonski opened 3 years ago
@PeterLuenenschloss there should be added additional argument in AutoML constructor multi_output=True
that will tell the AutoML object that it is going to train in multi-output environment. The final results can be saved as nested directories. The example:
automl = AutoML(result_path="AutoML_multi", multi_output=True)
clf = MultiOutputClassifier(automl).fit(X,Y)
There will be paths:
AutoML_multi/AutoML_1
AutoML_multi/AutoML_2
AutoML_multi/AutoML_3
How the predictions are working in MultiOutputClassifier
? Does it keep all objects in RAM?
There will be paths:
AutoML_multi/AutoML_1 AutoML_multi/AutoML_2 AutoML_multi/AutoML_3 and so on, till the number of targets
Yes, thats how i also thought it should be!
there should be added additional argument in AutoML constructor multi_output=True
Maybe it is worth thinking about not only supporting simple MultiOutput, but also ChainRegression (or even defaulting to that) by wrapping with sklearnChainRgressor
. In that case, there would also need to be an additional keyword, order
, that allows for altering the default chain order, and also the results folder AutoML
would need to somehow contain the model order mapping, for association of the trained AutoML
models with the target indices.
How the predictions are working in MultiOutputClassifier? Does it keep all objects in RAM?
Yes the wrapper trains a model for every target dimension and combines the resulting fitted model objects to a model that predicts the array of those single value predictions, (just by ordering the results accordingly).
The model instances are managed in the ram i guess. I Cant see no explicit to-disc-writing. The problem with the Classifier
wrapper, is, that it tries to collect the prediction classes from the fitted single value models, after the fit is done, by accessing each models ._classes
methods, wich are not implemented by fitted AutoML
models. (But for example, are implemented by other sklearn-style model objects, like Xgboost
). This step is done in the MultiOutputClassifier
, just in order to assign the list of those collected classes to the _classes
attribute of the constructed MultiOutput
model object at the end.
I can not find the 'multi_ouput' in the source code and document. Could you explain how can I use multi-output regression for my tabular data?
@xinlnix it is not yet implemented.
Built in implementation would be great but for others who need this in the meantime, the following seems to work and returns multioutput predictions.
automl = AutoML(mode="Explain")
clf = MultiOutputRegressor(automl).fit(x_train, y_train)
predictions = clf.predict(x_test)
Built in implementation would be great but for others who need this in the meantime, the following seems to work and returns multioutput predictions.
automl = AutoML(mode="Explain")
clf = MultiOutputRegressor(automl).fit(x_train, y_train)
predictions = clf.predict(x_test)
This method fits the same model again for me,
X_train.shape, X_test.shape, y_train.shape, y_test.shape
((2492, 500), (623, 500), (2492, 3), (623, 3))
automl = AutoML(mode="Explain", results_path=model_path)
reg = MultiOutputRegressor(automl).fit(X_train, y_train)
This model has already been fitted. You can use predict methods or select a new 'results_path' for a new 'fit()'.
This model has already been fitted. You can use predict methods or select a new 'results_path' for a new 'fit()'.
Support / integration for multioutput regression would be great! In a project, i am currently wrapping the
AutoML
instance withsklearn.multiputput
models to achieve multioutput fitting. This works nearly. There are only 2 problems:results_path
wont be empty after the first model is fit and subsequent training gets aborted.results_path
not set), the multioutput classification fails, since sklearn tries to accessAutoML._classes
when it does not exist. Dont know if that is solvable.