mmcdermott / EventStreamGPT

Dataset and modelling infrastructure for modelling "event streams": sequences of continuous time, multivariate events with complex internal dependencies.
https://eventstreamml.readthedocs.io/en/latest/
MIT License
98 stars 16 forks source link

end_time in task dataframes is not counted as an event (right open interval) #110

Open juancq opened 4 months ago

juancq commented 4 months ago

The end_time in a task dataframe is not counted as an event. I am not sure if this is a bug or if it's by design (if it's the latter, just a documentation update would do).

For example, in a task dataframe as listed below, for subject 1520408 let's assume two events before 2010-10-20 and one event on 2010-10-20, recording end_time as 2010-10-20 would be treated as a sequence of two events during the call to filter_to_min_seq_len in https://github.com/mmcdermott/EventStreamGPT/blob/2f433a695112fdccb7b28a50cb44b6f39fce4349/EventStream/data/pytorch_dataset.py#L322.

subject_id end_time label start_time
u32 datetime[μs] u32 datetime[μs]
1520408 2010-10-20 00:00:00 1 null
1569956 2010-02-14 00:00:00 1 null
1230099 2010-06-27 00:00:00 2 null

If the endtime is not meant to be included (as it currently is), it would be helpful to have a note in the documentation stating this.

mmcdermott commented 3 months ago

@juancq I'm not sure, but is it possible the change here: https://github.com/mmcdermott/EventStreamGPT/pull/107 solves this issue? Or am I misunderstanding the scope of the issue here.