nshafer / django-hashid-field

Django Model Field that uses Hashids to obscure the value
MIT License
370 stars 40 forks source link

The values_list and tight coupling to the database #45

Closed jskitz closed 4 years ago

jskitz commented 4 years ago

I've been evaluating this project as a way to obfuscate my internal identifiers. I think the work here is really good, so thank you so much for this tool and your work on this. As I've been evaluating this in my own project, I noticed that low level functionality like values_list doesn't actually provide the values but instead provides the Hashid object. For example:

In [1]: books = Book.objects.all()[:10]                                                                                                                                                                 

In [2]: books.values_list('id', flat=True)                                                                                                                                                              
Out[2]: <QuerySet [Hashid(3): MzRKGR7, Hashid(6): WYjEbNm, Hashid(9): LVjZvj2, Hashid(12): P1a9rRX, Hashid(14): 0GRXVNL, Hashid(17): 1rNOMRY, Hashid(19): m6a6kRD, Hashid(21): JLN7QRv, Hashid(23): JoNYnaV, Hashid(25): KVNWLaP]>

Is there any way that you know of, to have this just flatly produce either the hash IDs or the IDs? Of course this could be built into a custom manager I'm sure, but it seems like this does modify the way the database ORM typically works in Django.

This makes me think that the tight binding to the database perhaps is too tight an abstraction. If what I'm trying to do is obfuscate database identifiers so that people can't just march up my IDs or see how active my platform is, is there a way to just use Hashids as some sort of middleware to just translate between id and hashid without being tightly coupled to the database?

DRF Serializer Extensions takes this approach, but I thought that the execution of the idea was far too confusing and required a lot of additional buy in towards using their other structures, which I wasn't interested in.

I really love the simplicity of what you have created here and the ease in which it can be implemented without messing with my database IDs. I was just wondering what other limitations beyond values_list may I encounter having this tied so closely to the database as a field.

Have you considered this as an extension where you can just implement this at the serializer level without touching models or migrations (with full understand that it's an abstraction)? Thanks again.

nshafer commented 4 years ago

The tight coupling to the database is the entire point of this module, tbh. It's a model field for that reason. If you don't want that, you can take hashids strings from the user (i.e. "MzRKGR7") and convert that to an integer with just the Hashids library from pypi (which this module uses behind the scenes.) So something like:

hashids = Hashids(salt="something")
id = hashids.decode("MzRKGR7")
Book.objects.get(id=id)

Or to convert a bunch of integer IDs to hashids as in your example, you can do it manually like:

hashids = Hashids(salt="something")
books = Book.objects.all()[:10]
[hashids.encode(id) for id in books.values_list('id', flat=True)]

This module just abstracts that away for you, and allows you to set global configuration (salt, alphabet, minimum length) or field-by-field, do lookups by integer or hashid string, reference the value by either its original integer or encoded hashids string, works with django admin by default, drop in replace an existing field with existing data, etc.

It does break with the Django convention of not using descriptors to obfuscate values of model instances... I didn't really learn about that maxim until after I released this module. For me, I found it useful to be able to reference the two states of a given field at the same time... by integer or hashids string, hence why I used the descriptor to begin with. I'm not sure I would change it though, as having the field do the right thing depending on the context is handy. You can concatenate it to a string or use it in a template or str(book.id), and it'll be the hashids string. Or if you want the integer, you can just int(book.id). You can also access the raw types of any Hashid instance with id and hashid. So book.id.id and book.id.hashid.

So to answer your direct question and your example, you can get the integer IDs with:

books = Book.objects.all()[:10]                                                                                                                                                                 
[int(id) for id in books.values_list('id', flat=True)]
# Or use the property to get a the underlying integer
[id.id for id in books.values_list('id', flat=True)]
# Or just when iterating through a list of Books you can reference either values easily
for book in books:
  print("Book {}/{}: {}".format(book.id.id, book.id.hashid, book.name))

I've thought about making the descriptor an optional setting, but haven't done any work on it, nor really plan on doing so any time soon. Hope that helps.

jskitz commented 4 years ago

Thank you for the detailed response here. The thinking here makes total sense and helps quite a bit.

I also tested two other things that worked correctly. I was able to dumpdata into a fixture and reversing out the HashidAutoField just altered the field back to the id primary key, which is really nice that it's so easily reversible. It seems that beyond values_list, I have not seen any other limitations, and even that is a one liner change in the code.

Thanks again for the great tool and for the discussion! Since this isn't an issue, I'm going to close this out.

nshafer commented 4 years ago

No problem, happy to help, and thanks for the kind words. =)