koheiw / proxyC

R package for large-scale similarity/distance computation
GNU General Public License v3.0
29 stars 6 forks source link

Consider accepting dense matrices #24

Closed koheiw closed 2 years ago

koheiw commented 2 years ago

It is just wrap x and y by Matrix() internally.

require(Matrix)
require(microbenchmark)
sm1k <- rsparsematrix(1000, 1000, 0.01) # 1,000 columns
sm10k <- rsparsematrix(1000, 10000, 0.01) # 10,000 columns

# Convert to dense format
dm1k <- as.matrix(sm1k) 
dm10k <- as.matrix(sm10k)

microbenchmark(
    "dense 1k" = proxyC::simil(Matrix(dm1k, sparse = TRUE), margin = 2, method = "cosine"),
    "sparse 1k" = proxyC::simil(sm1k, margin = 2, method = "cosine"),
    "dense 10k" = proxyC::simil(Matrix(dm10k, sparse = TRUE), margin = 2, method = "cosine"),
    "sparse 10k" = proxyC::simil(sm10k, margin = 2, method = "cosine"),
    times = 10
)

The cost of conversion is small

Unit: milliseconds
       expr        min         lq       mean     median         uq        max neval
   dense 1k   53.77502   55.90578   62.91605   63.70098   68.08165   76.49488    10
  sparse 1k   39.13226   40.05156   46.82428   45.46336   47.33405   76.94272    10
  dense 10k 5089.44310 5126.92442 5522.14230 5663.23267 5716.33660 6030.22745    10
 sparse 10k 5148.79123 5245.75810 5783.34810 5858.60878 5933.02060 6828.32123    10

So, why not?