mdabros / SharpLearning

Machine learning for C# .Net
MIT License
383 stars 84 forks source link

Access OOB data and OOB error calculations of Random Forest #140

Open MichaelBenAssor opened 3 years ago

MichaelBenAssor commented 3 years ago

Hi Can you add access to the Out-of-Bag data and/or Out-of-Data error calculations for Random Forests?

Love this project, Thanks

mdabros commented 3 years ago

Hi @MichaelBenAssor,

Originally I decided to leave this feature out of the RandomForest implementation to keep memory consumption low, and leave the option of getting an unbiased estimate of the model error to the CrossValidation classes. But it is quite a nice feature of the original RandomForest, so it is something that could be cool to have. So if I can find a good solution that is optional, and does not add memory usage or performance degradation when running without it, I can add it.

I have a few other features I want to complete before working on this though, and with this being a spare time effort, it might be a while before I get to it :)

Best regards Mads

MichaelBenAssor commented 3 years ago

Hi @mdabros You can keep the OOB indices of records (record indices corresponding to the training set record lines) for each tree in a structure like: dictionary<tree#,List> This should keep the memory consumption to a minimum

Thank you and I hope to see this wonderful project continue to flourish!

Best, Michael