mostafaalishahi / eICU_Benchmark

36 stars 16 forks source link

Clarification regarding use of BiLSTMs in LoS and decompensation tasks #1

Closed prockenschaub closed 3 years ago

prockenschaub commented 4 years ago

First, thank you and your colleagues very much for putting together this benchmark dataset! I am currently looking into using the resource that you created in my own research. I cloned the repo and was able to create all the cohorts. I also had a look through the model code, where I came upon the following issue:

Issue

For the LoS and decompensation tasks, you make predictions at each time step. This seems reasonable and likely reflects the clinical use case. However, you seem to use bidirectional LSTMs to do so and I couldn't find a mechanism in the code that censors future information (i.e. preventing future steps from influencing predictions of early time steps).

Example: a patient might have an observed time series of 10 time steps (t=1, ..., 10). For its prediction at time step t=2, it should only take into account information from t=1 and t=2. This happens naturally when using a standard unidirectional LSTM. The bidirectional LSTM, on the other hand, sees time steps t=3, ..., 10 (i.e. the future of t=2) during its backward pass. This is likely to leak data, since the backward pass e.g. would only need to count the time steps from the last observation to get a lower bound on the remaining length of stay at t=2.

You might have taken that into account in your code and I simply missed it. If that's the case, could you please point towards the code that does that?

mostafaalishahi commented 4 years ago

@prockenschaub Thanks for your interest in our work, we noticed this issue and we will push an update soon that addresses it. If you have other questions do not hesitate to contact us.

prockenschaub commented 4 years ago

Perfect, thanks again for your valuable work and I am looking forward to the updates!

mostafaalishahi commented 3 years ago

Thanks again for your interest in this work. We have addressed this issue by filtering out the future data, implementing the filtering function (filter_future_data) in the source code that we published in here. Please also check the latest version of the paper in Arxiv to find out more about the updated LoS and decompensation tasks definition.