julianspaeth / random-survival-forest

A Random Survival Forest implementation for python inspired by Ishwaran et al. - Easily understandable, adaptable and extendable.
Other
59 stars 8 forks source link

Increase in running time for fit function #10

Closed aaby4373 closed 2 years ago

aaby4373 commented 3 years ago

I am running a dataset that has 12K rows, with this RSF module. The fit function has been running for the past 2 days and there is no output. can someone please confirm if this is expected?

Also, how does this address an imbalanced dataset?

julianspaeth commented 3 years ago

Hi @aaby4373, the runtime is indeed an issue in this package. With 12k rows, you have a lot of samples and it can take quite a while. The main focus of this package was to provide an easy-to-understand algorithm here and not the runtime. However, if someone has the time to tune the runtime that would be great.

Addressing imbalanced datasets is not addressed. There might be some over-/undersampling strategies for RSF that could be implemented.