kogalur / randomForestSRC

DOCUMENTATION:
https://www.randomforestsrc.org/
GNU General Public License v3.0
113 stars 18 forks source link

How to calculate the distance of some predicted data? #350

Open qijt123 opened 1 year ago

qijt123 commented 1 year ago

When predicting the result of random forest, rfsrc calculates the distance between all data. I would like to ask how to calculate the distance between some data instead of all data?

ishwaran commented 1 year ago

Here's an example for mtcars, where I calculate the OOB distance and then print out the distance for case 1 to every other data point in the learning set. In general the object $distance is annxn matrix with each row displaying the distance between that case and all other cases. Note that you can also obtain distance when predicting on test data.

> o <- rfsrc(mpg~.,mtcars,distance="job")
> o$distance[1,]
 [1] 0.00000000 0.02591036 0.42236842 0.30961310 0.57395833 0.30057009 0.56880952 0.47051282 0.48756614 0.35496032 0.34367560 0.54121212 0.60432900
[14] 0.48867244 0.60521542 0.67966102 0.64233766 0.60441176 0.73913043 0.67910448 0.51171429 0.50641026 0.43768116 0.50863679 0.62779079 0.72739726
[27] 0.43627451 0.54244306 0.48805916 0.24121212 0.56512114 0.46666667
qijt123 commented 1 year ago

Thank you for your quick reply. But if I calculate the pairwise distances between all cases in the test data, the calculation is so time-consuming because my data is too large. So I wonder how I can calculate the distances between subsets of the data I specified in the test set, such as the distances between case 1, 6, 8 and case 4, 5. Thank you in advance. Any suggestion will be highly appreciated.