Closed andredemori closed 5 years ago
Hi, thanks for using the PH-Tree.
The PH-Tree should always work 100% correct, that means it should always return all matching entries. But I admit that rangeQuery()
is a feature that I have hardly used/tested.
One thing to keep in mind is that the PH-Tree can store every point only once. If the same coordinate is added multiple times, the previous value is overwritten. Queries will only return the latest value for any coordinate.
I'm not entirely sure what you are exactly doing, you mention 'neighbors' and 'range queries', but the PH-Tree has no function to return neighbors in range queries. So I'm not really sure what kind of queries you performing:
Some questions:
PhTree
, PhTreeF
, PhTreeSolid
or PhTreeSolidF
?If you are using rangeQuery()
, maybe consider using window queries (query(...)
) and filter the results afterwards, that may be faster.
Also, could you point me to some information about the SlimTree?
Btw, if you want to store multiple values for a single coordinate you can store a HashSet
or List
of values for a coordinate.
In my project, the ultimate goal is to perform a search on astronomical datasets to find geometric patterns brought about by gravitational lenses like the Eistein Cross. In this phase of the project I am comparing these structures in relation to the search for neighbors given a point and a distance (Range Queries ()). The dataset I'm using has 1,035,204 objects with their spatial coordinates and other attributes. Slim-tree is a Metric Access Method created in Brazil at the University of São Paulo - USP. I'm sending you the link to the article below:
http://conteudo.icmc.usp.br/pessoas/caetano/artigos/Traina2000_EDBT_SlimTree.pdf
This is a method that can optimize the issue of overlaps in nodes. I have the algorithm written in C ++ and in my tests it returned me an F-Measure of 1.0, ie 100% of the data that should be returned were returned.
In the case the PH-tree code returned an F-measure of approximately 0.97, and did not return all the neighboring points of the query objects.
As you can see, using a dataset with 98 objects, the so-called 'positivo' is the number of objects returned and that should actually be returned.
Thanks for the reference, I did not know about the SlimTree.
I don't quite understand the meaning of the second line: "negativo | 12 | vizinhos nao retornados que nao deveriam ser retornados". I assume there should be almost 1,000,000 points that are (and should) not be returned? But it's not really important, just out of interest.
PhTreeF
?rangeQuery(dist, center)
, correct?As I mentioned, the PH-Tree ignores insertion with identical coordinates. So maybe you could check the following:
If you can't find any duplicates, would it be possible to send me the dataset so I can debug the problem?
Ah, I understand now, the results are from a smaller dataset. Could you answer my questions and, if you don't find any duplicates, send me this small dataset?
Thank you very much, you are absolutely right. I checked the dataset and there are duplications. As you mentioned, PH-Tree ignores insertion with identical coordinates.
Answers:
1 - Yes 2 - Yes 3 - Yes
Thanks. Well technically, it's doesn't ignore the duplicates, it replaces the previous value. To fix this you could use
PhTreeF<List<YourData>>
or maybe simply
PhTreeF<Integer>
(where Integer is initially '1' but is incremented when the value already exists).
I will soon (1-2 weeks) release an update that provides Java 8 Map API functions, such as putIfAbsent()
or compute()
. Especially the latter may be useful to handle such an Integer-counter or a List/Map/Set of values.
Thanks.
Hello Mr. Zäschke, I am a student of Information Technology and Communication in Brazil and currently work on a project titled Metric Indexing in Space Junctions at LNCC with Doctor Fábio Porto. In my research I worked on comparisons between data indexing methods like PH-tree, Slim-tree, Quatree. In these comparisons I observed the tree creation time and the search times to the neighbors in Range Queries. In short, I could see that PH-tree does not return all neighbors in their queries. I performed precision and recall calculations, and Slim-tree did better. Can you tell if the fact that PH-tree does not return all the neighbors is something normal of PH-tree's structure? It has something to do with the 'worst cases' mentioned in your article in '3.4 Spacial efficiency' or '3.5 Query efficiency'.
Thank you in advance.
André Muniz Demori - LNCC