vanderschaarlab / autoprognosis

A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.
https://www.autoprognosis.vanderschaar-lab.com/
Apache License 2.0
119 stars 27 forks source link

Example on the title page of basic survival analysis #58

Closed cbenoist314 closed 2 months ago

cbenoist314 commented 1 year ago

Hello, Thanks to provide us this package. I have difficulties for the example Basic Survival Analysis:

\# third party
import numpy as np
from pycox import datasets

\# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]

X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]

eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]

study_name = "example_risks"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons,
)

model = study.fit()

\# Predict using the model
model.predict(X, eval_time_horizons)

It give a error hence I have replace by this part :

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_vent="duration",
    time_horizons=eval_time_horizons,
)

by this part:

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons.tolist(),
)

I have a computation of several hours with many warnings and I have finally the error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 29, in <module>
  File "/home/benoip05/.local/python38/lib/python3.8/site-packages/autoprognosis/studies/risk_estimation.py", line 328, in fit
    model.fit(self.X, self.T, self.Y)
AttributeError: 'NoneType' object has no attribute 'fit'

Do you have an explanation or a solution ?

Thank you in advance.

bcebere commented 1 year ago

Hello @cbenoist314

Thank you for your feedback.

We will work on improving the docs available at https://autoprognosis.readthedocs.io/en/latest/generated/autoprognosis.studies.risk_estimation.html

When the study returns a None value, it means it cannot find a model above a certain threshold.

By default, a RiskEstimationStudy tests the following algorithms

[
     "survival_xgboost",
     "loglogistic_aft",
     "deephit",
     "cox_ph",
     "weibull_aft",
     "lognormal_aft",
     "coxnet",
 ]

Some of them require GPUs. You can try to reduce the search space on your end by using

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons,
    risk_estimators=["cox_ph", "weibull_aft", "survival_xgboost"],
    score_threshold=0.4,
)

We will work on improving the docs to make these parameters more clear.

Please let me know if this version is faster and returns a model. Thanks!