hashids / hashids.github.io

This is the old Hashids website. It is no longer maintained and has migrated over to Sqids.
https://sqids.org
206 stars 34 forks source link

Implementation question, should the hashid be stored in the database instead of a PK? #74

Closed iio7 closed 3 years ago

iio7 commented 3 years ago

Hi,

I have been reading up (a lot) about the different opinions on PK vs UUID etc., and found Hashids, which I very much prefer. However, from all the arguments in the debates on Hacker News, Reddit, StackerFlow and elsewhere, I haven't been able to determine whether the hashid should substitute the PK of the database table?

I understand that you can use hashid to simply convert, on-the-fly, back and forth using the PK, but at a bigger scale with multiple machines that's not really useful. I would like to use the hashid AS the PK, like YouTube does it.

What is the best implementation here? Still run with PK's and add a second column for storing hash'ed ID's?

What is the best approach?

Thanks and cheers.

miquelfire commented 3 years ago

Hashids is meant to mask the PK's value from the user. I'm curious how it won't scale with multiple machines? Hashids was NOT meant to be stored in the database. You'll need a numeric PK to use Hashids. Only reason to store a Hashid in the database is if you want to future proof changes in the algorithm making the values different for the same value.

Also, are you sure YouTube stores that value in the database? If they using something like Hashids (Hashids was based on YouTube), they have a PK we can't see, and most likely can't reverse engineer at this point with how big they are now.

iio7 commented 3 years ago

@miquelfire, thank you very much for your reply.

I have a very difficult time seeing how YouTube doesn't store the hash in the database. The main benefit of using the hash as the primary key is that it completely eliminates the problems that can arise during a possible migration of data from one database system to another, but most importantly, as in the case with YouTube, when you shard the data out on so many different machines, keeping the PK in sync takes a huge amount of effort and control. By storing the hash in the table and use that as the PK, all that goes away.

I do not know for sure, but I don't think they store a regular ID as the PK, I think they store the hash.

miquelfire commented 3 years ago

The way you want to use Hashids is creating a chicken and egg problem because you need a numeric ID to generate the hashid, and most (if not all) databases that are designed to run on multiple machines already have something in place for this issue you're asking about. You might be better looking at UUID instead for the reason you opened this ticket.

iio7 commented 3 years ago

Thanks, I'll see if I can't figure something else out. I am not a fan of UUID.