Closed df7cb closed 10 months ago
That's already in there:
pg_sphere--1.3.1--1.3.2.sql: pgs_circle_sel.sql.in pgs_hash.sql.in
cat upgrade_scripts/$@.in $^ > $@
That's already in there:
pg_sphere--1.3.1--1.3.2.sql: pgs_circle_sel.sql.in pgs_hash.sql.in cat upgrade_scripts/$@.in $^ > $@
Oh, sorry, I was expecting to see pg_sphere--1.3.1--1.3.2.sql
in the list of changed files in the commit and didn't look closely enough at the changes to the Makefile
.
The current implementation just xor-s the 4x4 bytes chunks together. I think that should yield a good distribution on the "random" coordinate values that should be seen in practise. If we think that's not good enough, we might just use the built-in hash functions on floats. (hashfloat8
, hashfloat8extended
)
I think, everythink is ok, but I would propose to improve the hash function. The longitude range is [0, 2*pi], the latitude range is [-pi/2,p/2], EPSILON = 1E-9. We have to calculate the hashes in these ranges. The smaller numbers are "equal" and should have the same hash. There are also some other special values like INF, NAN. I think that all NANs should produce the same hash value. Denormalized numbers are smaller than 1E-9, they are 0. I will think about some other implementation of hash function.
I don't think we can do much about mapping values less than EPSILON apart to the same value - we don't know in which direction to round. The only thing we could do is map values close to 0 to 0, but if we don't do it for the rest, why bother with 0.
Infinity is not a problem since it's a unique bit value, but along with NaN, can these really appear in spoint values? I even had trouble getting -0
into an spoint - at one point I think I had one, but couldn't reproduce it later.
One possible hash implementation that takes care of -0
and NaN
would be this:
PG_RETURN_INT32(hashfloat8(p1->lat + p1->lng));
or
PG_RETURN_INT32(hashfloat8(p1->lat) ^ hashfloat8(p1->lng));
@df7cb May be to use hashfloat8? I have no strong arguments against your implementation except that it is not the best hash function, I think. I want to ask my junior colleagues to think how to improve hash function and benchmark it.
I used the hash1 ^ hash2
approach now, and added documentation.
Besides hash indexes on spoint, this enables "select distinct spoint from tbl" queries.
Close #101.