Closed rcannood closed 5 years ago
How about using zapsmall
?
> getOption("digits")
[1] 7
> zapsmall(1 + 1e-6)
[1] 1.000001
> zapsmall(1 + 1e-7)
[1] 1
Cool, I didn't know about zapsmall()
! It does the job just fine:
library(proxyC)
x <- Matrix::Matrix(c(1, 2, 5, 3), ncol = 2, sparse = TRUE)
sim <- simil(x, y = NULL, method = "cosine")
acos(sim)
## 2 x 2 Matrix of class "dsyMatrix"
## [,1] [,2]
## [1,] NaN 0.390607
## [2,] 0.390607 NaN
## Warning message:
## In acos(x@x) : NaNs produced
sim@x <- zapsmall(sim@x)
acos(sim)
## 2 x 2 Matrix of class "dsyMatrix"
## [,1] [,2]
## [1,] 0.000000 0.390607
## [2,] 0.390607 0.000000
I don't mind running zapsmall()
right before calculating the acos, so you don't necessarily have to include it in proxyC. What do you prefer?
I think it is better to make cosine similarity to be precisely up to 1. Adding zapsmall()
is easy but getOption("digits")
is also used in print()
. If we apply zapsmall()
internally, we have to add a zapsmall
argument to proxy()
to avoid taking values from the option. This approach is similar to digest::sha1()
.
Due to rounding errors, the similarity in the current implementation can be larger than 1. This is not really desired, as calculating for instance cos^-1 will then produce NaNs:
Whereas by setting the maximum value to 1, everything will work as intended:
This can easily be tested with the following test: