FEAT Add SurvTRACE - Githubissues

Vincent-Maladiere commented 11 months ago

This PR aims at refactoring and packaging the SurvTRACE model. This effort focuses on:

Generalizing the preprocessing and feature engineering of SurvTRACE so that we can run it on any dataset (instead of the three provided).
Decoupling the preprocessing from the model, which relies on a STConfig global dictionary. We want to remove this global config and use hyper-parameters the scikit-learn way.
Simplifying the codebase and removing some of the boilerplate by using skorch. Also, skorch will enable pipelining and cross-validation operations.

cc @ogrisel

Vincent-Maladiere commented 10 months ago

This a "working WIP", upon running fit I get similar performances for training and validation errors as in the original SurvTRACE version.

Next steps:

[ ] update the tests and complete them
[ ] get the final metrics and compare results with the original SurvTRACE version.

Usage (with the seer dataset named "seer_cancer_cardio_raw_data.txt"):

from hazardous.data._seer import (
    load_seer,
    CATEGORICAL_FEATURES,
    NUMERICAL_FEATURES,
)
from hazardous.survtrace._model import SurvTRACE

X, y = load_seer("hazardous/data/seer_cancer_cardio_raw_data.txt")
print(X.shape, y.shape)  # (476746, 28), (476746, 2)

model = SurvTRACE(
    numerical_features=NUMERICAL_FEATURES,
    categorical_features=CATEGORICAL_FEATURES,
)
model.fit(X, y)

cc @ogrisel, I think this is reviewable :)

Vincent-Maladiere commented 9 months ago

This PR is being split:

the SEER part is being developed at https://github.com/soda-inria/hazardous/pull/24
the SurvTRACE part is being developed in the branch icml-2024

ogrisel commented 8 months ago

Let's close this and later open a new PR dedicated to the model then.

soda-inria / hazardous

FEAT Add SurvTRACE #15