vanderschaarlab / autoprognosis

A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.
https://www.autoprognosis.vanderschaar-lab.com/
Apache License 2.0
95 stars 26 forks source link

Use of any imputer leads to an error #55

Closed MassimilianoGrassiDataScience closed 1 year ago

MassimilianoGrassiDataScience commented 1 year ago

I am using autoprognosis installed via pip (0.1.8). if I keep some missing values in my training dataset, the training always stops after a few seconds with this error:

File ~/mambaforge/envs/AUTOPROGNOSIS2/lib/python3.9/site-packages/autoprognosis/studies/classifiers.py:103, in ClassifierStudy.__init__(self, dataset, target, num_iter, num_study_iter, timeout, metric, study_name, feature_scaling, classifiers, imputers, workspace, hooks, score_threshold, group_id, nan_placeholder)
    100 else:
    101     imputers = []
--> 103 self.X, _, self.Y, _, _, group_ids = dataframe_preprocess(
    104     dataset, target, imputation_method=imputation_method, group_id=group_id
    105 )
    107 self.internal_name = dataframe_hash(dataset)
    108 self.study_name = study_name if study_name is not None else self.internal_name

File ~/mambaforge/envs/AUTOPROGNOSIS2/lib/python3.9/site-packages/autoprognosis/studies/_preprocessing.py:294, in dataframe_preprocess(df, target, time_to_event, special_cols, sample, imputation_method, group_id)
    288 X = df.drop(drop_columns, axis=1)
    289 Y = df[target]
    291 (
    292     X,
    293     encoders,
--> 294 ) = dataframe_encode_and_impute(X, imputation_method)
    296 X = dataframe_drop_low_variance(X)
    298 if sample:

File ~/mambaforge/envs/AUTOPROGNOSIS2/lib/python3.9/site-packages/autoprognosis/studies/_preprocessing.py:216, in dataframe_encode_and_impute(orig_df, imputation_method)
    213 df, encoder_ctx = dataframe_encode(df)
    215 if df.isnull().values.any() and imputation_method:
--> 216     df, imputer = dataframe_imputation(df, imputation_method)
    217     encoder_ctx.set_imputer(imputer)
    219 return df, encoder_ctx

File ~/mambaforge/envs/AUTOPROGNOSIS2/lib/python3.9/site-packages/autoprognosis/studies/_preprocessing.py:159, in dataframe_imputation(df, method)
    157 log.debug(f"preprocess: dataset imputation using {method}")
    158 columns = df.columns
--> 159 imputer = Imputers().get(method)
    161 output = imputer.fit_transform(df)
    162 output.columns = columns

File ~/mambaforge/envs/AUTOPROGNOSIS2/lib/python3.9/site-packages/autoprognosis/plugins/core/base_plugin.py:264, in PluginLoader.get(self, name, *args, **kwargs)
    261     raise ValueError(f"Plugin {name} doesn't exist.")
    263 if name not in self._plugins:
--> 264     self._load_single_plugin(self._available_plugins[name])
    266 if name not in self._plugins:
    267     raise ValueError(f"Plugin {name} cannot be loaded.")

File ~/mambaforge/envs/AUTOPROGNOSIS2/lib/python3.9/site-packages/autoprognosis/plugins/core/base_plugin.py:234, in PluginLoader._load_single_plugin(self, plugin)
    231     log.critical(f"module {name} load failed")
    232     return
--> 234 log.debug(f"Loaded plugin {cls.type()} - {cls.name()}")
    235 self.add(cls.name(), cls)

File ~/mambaforge/envs/AUTOPROGNOSIS2/lib/python3.9/site-packages/autoprognosis/plugins/imputers/plugin_mean.py:45, in MeanPlugin.name()
     43 @staticmethod
     44 def name() -> str:
---> 45     return base_model.name()

NameError: name 'base_model' is not defined

This happens both if I indicate the imputers I want to be used and if I omit it. Instead, I impute missing value before using autoprognosis, it goes on without errors.

Below you see the code I use to launch the training:

ClassifierStudy(dataset= X_train, target= 'outcome', num_iter = 30, num_study_iter = 5, timeout = 600, metric = 'aucroc', study_name = 'PROVA', workspace = workspace, score_threshold= 0.65)

Any idea of what can be the issue or any suggestion to fix it?

bcebere commented 1 year ago

Hello @MassimilianoGrassiDataScience

Thank you for your feedback.

The latest version of autoprognosis is 0.1.14, we fixed several bugs in the meantime.

Could you please upgrade the library using

pip install autoprognosis --force

and try again? This updates the depends as well., including hyperimpute, which seems to trigger the issue.

Please let me know if the issue persists. Thank you!

MassimilianoGrassiDataScience commented 1 year ago

Solved! Thanks again