hammerlab / survivalstan

Library of Stan Models for Survival Analysis
Apache License 2.0
124 stars 23 forks source link

prep-data-long-surv doesn't treat multiple events properly #41

Closed jburos closed 7 years ago

jburos commented 7 years ago

prep_data_long_surv currently works very simplistically by assuming that each subject has one and only one record.

This assumption works for most common use cases, but it breaks down when considering a semi-competing or competing risks model. In these scenarios we often have input data containing more than one event per subject.

Passing input data like the following, which has 4 events with subject_id == 7:

screen shot 2017-01-07 at 3 06 53 pm

Currently results in 4 records (duplicates by subject_id) for each failure time:

screen shot 2017-01-07 at 3 09 05 pm

Instead, we would rather keep only one of these for each subject_id * end_time combination.

This is a prereq for #36