Open ashleyheath opened 4 months ago
Alternatively, we could use a hash instead of the actual queue name. That would leave a lot of room for various values.
That's certainly an interesting idea. I'd be worried that changing the queue name in the jobs table would hinder observability and debugging but if we could use a hash only only in the pg_notify
publish and subscribe calls that might do the job nicely.
Ah yes, of course, I'm only talking about the "notification" name.
Would you like to make a PR for that ?
>>> base64.b32encode(hashlib.sha1(b"my_queue").digest()).decode().lower()
'756gnpdwk3lna3zna4toqq4eh4dhu4r6'
>>> len(_)
32
(this gives 32^32 possibilities, so 10^48, meaning you'd have roughly the same probability of collision as choosing a random atom on the planet Earth and your friend also choosing a random atom, and you've both randomly picked the same atom)
And we should log the mapping between the queue name and the hash either at the start of the worker, or when we listen (level debug).
>>> base64.b32encode(hashlib.sha1(b"my_queue").digest()).decode().lower()
'756gnpdwk3lna3zna4toqq4eh4dhu4r6'
>>> len(_)
32
@ewjoachim did you envisage doing the hash from the python code then (as opposed to from within the SQL function)? And if so, adding a new column to procrastinate_jobs
to hold the hashed queue name?
Ah I was thinking about the listen part where it's equal to do it in Python and in SQL. Yeah, of course, we need an SQL implementation of that for the notify part, so yeah, let's do that in SQL.
Doing a little digging, it seems the digest
function in postgres is actually part of the separate pgcrypto library.
So the options seem to be:
LISTEN
(in python) and when calling pg_notify
I don't have strong preferences but can imagine that some users may perfer the second due to restrictions on what they can install. @ewjoachim any thoughts?
Looks like the hashtext
function is available without the need for extensions. It's apparently not guaranteed to be stable with PG versions, but we don't need it to be extremely stable, nothing will break if listen/notify doesn't work during a PG migration.
It seems postgres is compiled by default with an id size of 63 bytes.
procrastinate_notify_queue()
performsthat means there's a practical limit of 43 ascii characters on the
queue_name
.While technically someone could build their postgres instance with a different id size in practice I expect this won't be the case and so it feels like it would be sensible to either:
queue
values are <= 43 in lengthpg_notify
where queue names are too long (and allow workers to pick up jobs with longer queue names exclusively through polling)Given a UUID is 36 characters, 43 is pretty limiting if I want to create 'thing' sepecific queues (e.g.
email-customer/00000000-0000-0000-0000-000000000000
so it certainly feels useful to allow the use of longer queue names but appreciate that would be a breaking change with the existing behaviour.Keen to get thoughts from people.