prep_data_long_surv currently works very simplistically by assuming that each subject has one and only one record.
This assumption works for most common use cases, but it breaks down when considering a semi-competing or competing risks model. In these scenarios we often have input data containing more than one event per subject.
Passing input data like the following, which has 4 events with subject_id == 7:
Currently results in 4 records (duplicates by subject_id) for each failure time:
Instead, we would rather keep only one of these for each subject_id * end_time combination.
prep_data_long_surv
currently works very simplistically by assuming that each subject has one and only one record.This assumption works for most common use cases, but it breaks down when considering a semi-competing or competing risks model. In these scenarios we often have input data containing more than one event per subject.
Passing input data like the following, which has 4 events with subject_id == 7:
Currently results in 4 records (duplicates by subject_id) for each failure time:
Instead, we would rather keep only one of these for each subject_id * end_time combination.
This is a prereq for #36