Closed LeeJunHyun closed 3 years ago
Thank you for the kind words!
You're probably looking for kkbox_v1.read_df()
which is the dataset used in "Time-to-event prediction with neural networks and Cox regression".
The dataset in kkbox.read_df()
is used in the paper "The Brier Score under Administrative Censoring: Problems and Solutions" and has some improvements compared to kkbox_v1
in addition to administrative censoring times.
From the read_df docs you can see that kkbox_v1.read_df
accepts arguments for training, validation and test set.
@havakv Thank you so much! Your reply is a great help for me :)
Happy to help! I'll close this issue then, and you can reopen it if you don't consider it solved.
I really appreciate your commitment to this field.
I got kkbox dataset by using
kkbox.read_df()
.The kkbox dataset consists of 2,814,735 instances, so how can I get the same split as described in the paper?
In the paper, there are 1,786,333 train samples, 661,748 test samples, and 198,665 valid samples. (2,646,746 instances in total)
Thanks.