kon0925 / coconut

A scalable bottom-up approach for building data series indexes
Other
14 stars 1 forks source link

Error statistics of approximate search results in some situations #3

Open CaucherWang opened 2 years ago

CaucherWang commented 2 years ago

In approximate search of coconut.c and coconut_plus.c, the codes traverse the nearest records (the number is equal to leaf node size) and find the best. If the dataset is static, this method is acceptable. But notice that if new insertion comes (it is normal in the actual situation), the full node will split into 2 new leaf nodes in B+-tree and the leaf nodes in B+-tree are not sequential on disk (actually they need a disk pointer to connect each other), and then some nodes are not full (at least half full). In this situation, the minimal distance/ recall rate, etc will be worse than now.

Waiting for your replies. :)

CaucherWang commented 2 years ago

Appendix: note that the new insertions do not belong to original dataset