robhogan / dynamodb-geo.js

A node-friendly typescript port of https://github.com/awslabs/dynamodb-geo
210 stars 96 forks source link

Explanation for Hash Key length rationale? #39

Open atyshka opened 4 years ago

atyshka commented 4 years ago

I'm trying to understand the documentation on the hash key length:

However if your data is dense and hashKeyLength too short, more RCUs will be needed to read a hash key and a higher proportion will be discarded by server-side filtering.

Let's imagine an extreme scenario where our hash key length is 1, and all of our data is stored under this one hash key. For dynamodb queries, we only contribute to RCUs for items that are accessed. We are still restricting our query to be between the minvalue and maxvalue. According to my understanding, only items between minvalue and maxvalue will be considered accessed and therefore contribute to the consumed RCUs. The number of items between the minvalue and maxvalue will be the same regardless of the hash key length. Therefore, using a small hashkeylength should have no impact on RCUs. Of course, this would lead to a hot partition, but that is a different issue. I also understand why a large hashkeylength is bad, as it results in more query operations. I still don't understand though why a small hash key length wouldn't be the most optimal choice. Is there something I'm misunderstanding about the geohash algorithm or dynamodb RCUs?

gham-khaled commented 4 years ago

Actually I don't think it work this way, Let's suppose you want to query in a 20M Raduis and your hash key length is 1 :