Open exalate-issue-sync[bot] opened 1 year ago
Nidhi Mehta commented: #93670 (https://support.h2o.ai/a/tickets/93670) - Re: h2o DRF question
JIRA Issue Migration Info
Jira Issue: PUBDEV-6270 Assignee: New H2O Bugs Reporter: Lauren DiPerna State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
Review the computational efficiency of returning a proximity matrix with H2O-3's Distributed Random Forest and consider adding something similar to the proximity matrix that R's [Random Forest implementation|https://cran.r-project.org/web/packages/randomForest/randomForest.pdf] returns.
Note: The size of the matrix can be a limiting factor, in certain cases it may be impossible to calculate the full matrix - one solution to this, could be to keep just N of the most similar rows. Breiman also ran into computation issues with his Random Forest implementation for the proximity matrix: