Closed mpriorust closed 2 years ago
Hi, you could use a FIlter (see README):
template <dimension_t DIM, typename T>
struct FilterForCountingNodes {
[[nodiscard]] constexpr bool IsEntryValid(const PhPoint<DIM>& key, const T& value) const {
return true;
}
[[nodiscard]] constexpr bool IsNodeValid(const PhPoint<DIM>& prefix, int bits_to_ignore) const {
++count;
return true;
}
size_t count{0};
};
for (auto it = tree.begin_query({1, 1, 1}, {3, 3, 3}, FilterForCountingNodes<3, T>())); it != tree.end(); ++it) {
...
}
for (auto it = tree.begin_query(YourDistanceFuntionHere, FilterForCountingNodes<3, T>())); it != tree.end(); ++it) {
...
}
Of course you could also adapt the iterator_knn.h
and iterator_full.h
/iterator_hc.h
to count visited nodes.
Some things to note:
May I ask why you are interested in the number of node visits?
Let me know if you find any problems.
Thank you for your response, and the hint to use filters.
I want to measure the I/O costs by counting the number of times leaf nodes are visited when running range or knn queries. Is this possible and appropriate? If so, is there a differentiation between leaf and non-leaf nodes?
No there is no difference between leaf/non-leaf nodes.
I think this can be used to estimate I/O costs. I am not sure how that would work in detail because the actual "cost" depends on whether node is flushed from a cache level? There are ways to measure this, I think vtune can measure I/O wait, and I think there are other options as well. I think the CPU can report these metric and you can read them out with command line tools (don't ask me how though).
You should probably also count+measure the access to individual entries (=points). The PH-Tree keeps them in an array that belongs to the node, while other spatial indexes often store them as separate entries. Another advantage of the PH-Tree is that it first checks the HC-position so it does not necessarily need to check (or even load) all subnodes or points in a nodes, i.e. it can exclude a subnode without having to access its coordinate.
In case you are doing research: note that the C++ implementations is a bit different from the approach described in the PH-Tree research paper. For example, the C++ version does not serialize a node's data into a bitstring (which would save some space and reduce I/O). The Java version does this bit-string-serialization, but only until 8 dimensions or so. The bitstream approach does improve speed for low dimensions but becomes too expensive for remove/insert operations for higher dimensionality.
Dear All,
I initialised a tree: PhTreeD<DIM, int> PhTree;
I Inserted points of type PhPointD, and run PhTree.begin_query and PhTree.begin_knn_query on it.
Both queries return the expected results.
I wanted to ask if there is a way to trace the number of times leaf nodes are visited when running the queries.
Kind regards,
Max Prior