hammerlab / survivalstan

Library of Stan Models for Survival Analysis
Apache License 2.0
124 stars 23 forks source link

Time and event column types #65

Closed adam-haber closed 6 years ago

adam-haber commented 7 years ago

Following the PEM example with my own data, I got unreasonable results using survivalstan.utils.plot_observed_survival.

After casting my time column to float (was int before) and the event column to boolean (was 0/1 before), everything worked.

Is this intentional?

jburos commented 7 years ago

This is not intentional - but is not a scenario I am explicitly testing for. I will add this. Thanks for the head's up.

One question - did you transform your data using prep_data_long_surv or are your data already in start-stop / long / denormalized format? This will help me narrow down the possible locations of the problem.

adam-haber commented 7 years ago

The data wasn't in a long format; I just had an "event" column (some of it censored) and a "time" column.

jburos commented 7 years ago

Great, so that helps a lot. This is still an issue I'll want to catch & fix, but to start with you should first transform your data to "long" format in order to fit the PEM model.

This would be a two-step process, like so:

dlong = survivalstan.prep_data_long_surv(df=d, event_col='event', time_col='t')
fit = survivalstan.fit_stan_survival_model(
    model_code = survivalstan.models.pem_survival_model,
    df = dlong,
    sample_col = 'index',
    timepoint_end_col = 'end_time',
    event_col = 'end_failure',
    formula = '~ age_centered + sex'
)

You may very well still run into the int/float and boolean/int problems you noted above, but noting this here since it came up.

Linking to related issue / recommendation #64 since that would make the need for a two-step process somewhat obsolete