Closed IlyasMoutawwakil closed 2 years ago
@wdm0006 I still haven't made any unitests so I'm wondering if it should be in the same file you used as seperate methods or should I create a new file file?
I think keeping the new tests in a separate file (one for vectorized and one for the numba versions) would be cleanest.
@wdm0006 There's some fails in distance and stats, but I guess they're just python rounding errors Anyway, here's the unittest log:
test_check_validity (tests.test_geohash.TestGeohash) ... ok
test_decode (tests.test_geohash.TestGeohash) ... ok
test_distance (tests.test_geohash.TestGeohash) ... FAIL
test_encode (tests.test_geohash.TestGeohash) ... ok
test_stats (tests.test_geohash.TestGeohash) ... FAIL
test_decode (tests.test_nbgeohash.TestNumbaPointGeohash) ... ok
test_encode (tests.test_nbgeohash.TestNumbaPointGeohash) ... ok
test_decode (tests.test_nbgeohash.TestNumbaVectorGeohash) ... ok
test_encode (tests.test_nbgeohash.TestNumbaVectorGeohash) ... ok
@wdm0006 what do you think
Overall looks good but let's use almostEquals to avoid rounding error issues in the tests and make sure that the soft dependencies are actually optional.
actually there's no rounding issue in decoding or coding to geohash, the issue is in distance and stats (which i did'nt implement), I can change them but that should be in a another PR ?
Copied from what I did in geohash-on-steroids:
Numba geohash
Scripts to encode to and decode from the geohash system in a python-optimized way.
Dependencies
Optimized functions are created with the
njit
decorators and using arrays so the only dependencies are Numba and Numpy.Performance
As you can see in my notebook, performance gain in comparison to the default python package pygeohash is the following:
But geohashing is generally performaded on large amounts of data points so I made a vector-wise implimentation that perform well at large scale: