Explanation for Hash Key length rationale?

robhogan / dynamodb-geo.js

A node-friendly typescript port of https://github.com/awslabs/dynamodb-geo

210 stars 96 forks source link

I'm trying to understand the documentation on the hash key length:

However if your data is dense and hashKeyLength too short, more RCUs will be needed to read a hash key and a higher proportion will be discarded by server-side filtering.

Let's imagine an extreme scenario where our hash key length is 1, and all of our data is stored under this one hash key. For dynamodb queries, we only contribute to RCUs for items that are accessed. We are still restricting our query to be between the minvalue and maxvalue. According to my understanding, only items between minvalue and maxvalue will be considered accessed and therefore contribute to the consumed RCUs. The number of items between the minvalue and maxvalue will be the same regardless of the hash key length. Therefore, using a small hashkeylength should have no impact on RCUs. Of course, this would lead to a hot partition, but that is a different issue. I also understand why a large hashkeylength is bad, as it results in more query operations. I still don't understand though why a small hash key length wouldn't be the most optimal choice. Is there something I'm misunderstanding about the geohash algorithm or dynamodb RCUs?

Actually I don't think it work this way, Let's suppose you want to query in a 20M Raduis and your hash key length is 1 :

You would have to retrieve the items starting with the same geohash letter (it will be probably all your items if your locations are not world-spread) and then the filtering will be done in the server side with the Haversine formula ( I don't think there could have a min and max value on the sort key). --> This will be the same as scanning the whole table and comparing positions one by one as if you are not taking advantage of the geohash at all Let's now you suppose your hash key length is 10:
You would have to retrieve the items starting with the same 10 geohash letter (it will be probably the items within 50m not sure though) and then you will filter the rest using the same formula -> This way you will have optimized your search by filtering most of the locations in your table from the start Hope this is clear

robhogan / dynamodb-geo.js

Explanation for Hash Key length rationale? #39