taoensso / faraday

Amazon DynamoDB client for Clojure
https://www.taoensso.com/faraday
Eclipse Public License 1.0
238 stars 84 forks source link

Automatic IDs? #21

Closed Engelberg closed 10 years ago

Engelberg commented 10 years ago

Is there a way to auto-generate the next unused ID for a primary key?

paraseba commented 10 years ago

There is no concept of order (or next) for DDB primary keys. To define an "ID" you would need both the hashkey and the (optional) range key. The range key has an order for a given hash key, and you could be interested in generating the next one, but notice that the range key itself doesn't constitute an ID. Many objects could share the same range key or hashkey but not both.

If what you are trying to do is to generate the next range key for a given hash key, I think you are out of luck. You either need to track state somehow, or make sure there is no race condition between the moment you read the last id and you write the next one. There is no way in DDB to make a write based on a condition for a different key.

I'm probably misunderstanding your question completely, sorry if that's the case.

Engelberg commented 10 years ago

Sounds like my question was confusing. In the README, you show that it is vital to manually add an :id (primary key) to the map before inserting it into the database. In the example, you simply appear to be counting up from 0. In a real application, how can you guarantee to pick a new :id value for each new map you add to the database? (For example, some databases establish the primary key as an auto-incrementing integer. Mongo automatically creates a UUID to serve as the id. It's not clear if the UUID approach would work with faraday, because the implication is that id's need to be numbers, not a long string.) So in general, what's the recommended process for guaranteeing a unique ID?

paraseba commented 10 years ago

Oh I see. That's just a toy example. Notice that when the table is being created:

 [:id :n]  ; Primary key named "id", (:n => number type)

that :n is saying that the id attribute is a number, but it could also be a string or binary.

I don't think there is a "safe" way to generate these ids in a distributed system, without storing state somewhere. But also, remember that the hash keys (:id here) don't have inherent order, you can not retrieve a sorted list from DynamoDB for instance.

Long enough random strings/numbers are an option. In my view, generating the keys is definitely beyond the project goals. Speaking for myself here, don't know what is @ptaoussanis 's view on the matter.

ptaoussanis commented 10 years ago

Hi Mark, Sebastián's correct. (Thanks @paraseba!)

Since the primary key also acts as an index, you'll often want to provide something that's meaningful to your application like a username, product code, etc. Otherwise a simple UUID would be quite appropriate.

Note that there'd be no real benefit to having incremental primary ids since DDB automatically partitions over the primary key space and so range queries wouldn't be a possibility anyway. That's where the "range" key comes in handy, as it's designed to play well with the DDB key partitioning.

Does that help / make sense?

Engelberg commented 10 years ago

Yes, thanks.

ptaoussanis commented 10 years ago

BTW feel free to shout if you run into any other problems, the docs are currently serviceable at best.