Open wlzhao22 opened 1 month ago
I assume you are interested in KNN query performance? It works "reasonably" well with high dimensional data (I have tested the Java implementation with 1000 dimensions and the C++ implementation with 60 dimension).
You will have to measure it. I think there is a good chance the PH-tree works very well, it maybe 20% slower than other or equally likely considerably faster.
There are several points to this:
I have put some of my measurements (up to 40 dimensions) in a document: https://github.com/tzaeschke/TinSpin/blob/master/doc/benchmark-2017-01/Diagrams.pdf:
-P
) with up to 40 dimensions. CU and CL are synthetic datasets as described in the document. There are two PH-trees in these measurements: PH
and PHM
.If your are happy with approximate KNN, you can also have a look at https://github.com/erikbern/ann-benchmarks
The PH-tree follows a Z-curve (Morton order) which has almost the same locality preserving features that Hilbert curves have, however z-curves a much easier and faster to calculate than Hilbert curves.
Thanks for your work! I am just wondering how well it works on high dimensional data, such as SIFT or various deep features. Comparing with Hilbert curve, does it hold similar property in preserving the locality of the high dimensional points. That is, when two points a, b \in R^d are neighbors, they will be neighbors as well when they have been converted to one-dimensional values.