Closed nournia closed 11 years ago
Yes, I think there is.
Use metric='precomputed'
and pass the distance matrix instead of the data. That should do the trick.
Thanks, but I can't do that. My data is large and I can't afford to O(n^2)
cost of distance matrix. Actually I don't know is DBSCAN the right algorithm. That was my only choice because I didn't find any other clustering algorithm in scikit-learn that works without whole features or distance matrix. Entries aren't representable in feature space and I wrote a special similarity function for pairs.
Ok, so that is a whole different story then. Our implementation of the DBSCAN definitely computes the whole dissimilarity matrix. How many samples to you have? The only clustering algorithm in scikit-learn that supports out of core computations is minibatch k-means. And that doesn't work with precomputed dissimilarities. From the top of my head I don't really know any algorithms that would work well in you setting. Maybe try some core-set approach? Most clustering algorithms are at least quadratic in time complexity, so you would have to wait quite a long time any how. Would a long run-time be ok for you?
Btw, closing this as the title of the issue is not really your problem. Maybe go to metaoptimize and ask about out of core algorithms for arbitrary distance measures.
Thanks.
I want to cluster records of data that are not in float matrix form and also there isn't any feature vector for each record. Hopefully DBSCAN clustering algorithm can use callable similarity function but:
tries to convert whole matrix into float type and rises this error:
Is there any solution for this problem?