Closed yuntianf closed 9 months ago
Hi, it looks like your frNN object is corrupt. I have now implemented a less memory-hungry implementation for the conversion from dist to frNN in the function frNN()
. It is now also used directly by dbscan()
.
Please install the latest development version from GitHub and check if frNN(dis, eps = xxx)
works for you.
Thanks! But I do have a very large sparse distance matrix, and I don't want to store it as a dist
object, but build a frNN
object from scratch instead. I'm wondering if it is feasible.
Yes, it is possible to directly convert a sparse representation of a symmetric distance matrix into a frNN object.
How is your sparse distance matrix stored? Using the Matrix
package (more specifically the dsCMatrix
class)? Do you use a package to create the sparse distance matrix?
No I didn't use any package, I just store the sparse distance as a long table, each row representing the distance between 2 nodes, if dsCMatrix
could work for frNN
, I will have a try, thanks!
Hi, I found that even I feed frNN()
with a dsCMatrix
, that function will transform it to normal matrix
if (!.matrixlike(x))
stop("x needs to be a matrix to calculate distances")
x <- as.matrix(x)
For some really large matrix this step will still occupy too much memory, and this step will also fill sparse matrix with 0, which is usually inf
in a distance matrix, and this may cause fome problems.
That is correct. So you use a triplet format for all non-infinite entries. This definitely can be converted into an frNN object without losing the sparseness. All that needs to be done is remove all the rows that are > eps and then collect the indices and distances into a frNN object. I guess that is what your code tries to do, but it seems not to work. If there was a standard sparse distance representation, then I would incorporate it directly into the package, but it seems to be not the case.
For similarities, a sparse representation where 0s are dropped would probably be more natural.
I have a large distance matrix and want to first build
frNN
object from scratch to reduce memory burden. I first initialize afrNN
object with one node and then add my distance and node id to this object.But when I used frNN object built in this way in
dbscan
, it caused a segfault error