mhahsler / dbscan

Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package
GNU General Public License v3.0
311 stars 65 forks source link

Improve LOF function #42

Closed eduardokapp closed 3 years ago

eduardokapp commented 3 years ago

Improved local reachability distance calculation as to speed up performance when k is large. Very simple change, I just removed the apply max function clutter, using the base max.col function and some minor index trick to access the respective max value for each row of cbind(d$dist[d$id[i, ], k], d$dist[i, ])

mhahsler commented 3 years ago

Hi, thank you for bringing this up. I decided to change the code to:

  # calculate local reachability density
  # reachability-distance_k(A,B)=max{k-distance(B), d(A,B)}
  # lrdk(A)=1/(sum_B \in N_k(A) reachability-distance_k(A, B)/|N_k(A)|)
  lrd <- numeric(n)
  for(i in 1:n) lrd[i] <- 1/(sum(
    pmax.int(d$dist[d$id[i,], k], d$dist[i,])) / k
  )

This is slightly faster than your version and I hope it is easier to maintain. The change is on GitHub and will be part of the next release.

Thanks, Michael

eduardokapp commented 3 years ago

I didn't know about pmax.int! Very cool. Thank you. Oh and if you wouldn't mind, I changed my username here, so if you could change the commit message from 'tomverlaine' to 'eduardokapp' that'd be great. Thank you again for being receptive.

mhahsler commented 3 years ago

Done.