RegressionStudy does not save any model.p file, while ClassifierStudy does save it correctly

MassimilianoGrassiDataScience commented 1 year ago

if I run RegressionStudy, it gets to the end without error but in the study folder no model.p file is saved. Instead, if I run the search with the very same data with ClassifierStudy and dichotomizing the outcome variable, I have model.p in the (same as before) folder.

study = ClassifierStudy(
    study_name=study_name,
    dataset=DATASET_TRAIN,
    target="outcome",
    num_iter=1, 
    num_study_iter=1,  
    workspace=workspace) 

study.run()

study = RegressionStudy(
    study_name=study_name,
    dataset=DATASET_TRAIN,
    target="outcome",
    num_iter=1, 
    num_study_iter=1,  
    workspace=workspace)

study.run()

I tried to increase it to num_iter=10 and num_study_iter=10 but the result is the same.

Any idea of what can be the issue?

Thank you

bcebere commented 1 year ago

Hello @MassimilianoGrassiDataScience

Thank you for reporting this!

Can you please run the script using DEBUG log-level? You just need to add this at the start of your script

# stdlib
import sys
import autoprognosis.logger as log

log.add(sink=sys.stderr, level="DEBUG")

The error handling here could be improved. The most likely reason there is no model.p is because it cannot find an ensemble that performs over a threshold. For ClassificationStudy, that threshold is reported with the AUCROC, and for the regression study, with the R2 score.

This threshold can be controlled with the score_threshold parameter for the studies. By default, this value is 0.65. In the regression case, most likely in cannot find an ensemble with R2 > 0.65. And the debug logs should confirm this.

MassimilianoGrassiDataScience commented 1 year ago

Yep, that was the reason.

Thanks!

vanderschaarlab / autoprognosis

RegressionStudy does not save any model.p file, while ClassifierStudy does save it correctly #41