procrastinate-org / procrastinate

PostgreSQL-based Task Queue for Python
https://procrastinate.readthedocs.io/
MIT License
852 stars 52 forks source link

queue names limited to 43 characters #1067

Open ashleyheath opened 4 months ago

ashleyheath commented 4 months ago

It seems postgres is compiled by default with an id size of 63 bytes.

procrastinate_notify_queue() performs

PERFORM pg_notify('procrastinate_queue#' || NEW.queue_name, NEW.task_name);

that means there's a practical limit of 43 ascii characters on the queue_name.

While technically someone could build their postgres instance with a different id size in practice I expect this won't be the case and so it feels like it would be sensible to either:

Given a UUID is 36 characters, 43 is pretty limiting if I want to create 'thing' sepecific queues (e.g. email-customer/00000000-0000-0000-0000-000000000000 so it certainly feels useful to allow the use of longer queue names but appreciate that would be a breaking change with the existing behaviour.

Keen to get thoughts from people.

ewjoachim commented 4 months ago

Alternatively, we could use a hash instead of the actual queue name. That would leave a lot of room for various values.

ashleyheath commented 4 months ago

That's certainly an interesting idea. I'd be worried that changing the queue name in the jobs table would hinder observability and debugging but if we could use a hash only only in the pg_notify publish and subscribe calls that might do the job nicely.

ewjoachim commented 4 months ago

Ah yes, of course, I'm only talking about the "notification" name.

Would you like to make a PR for that ?

>>> base64.b32encode(hashlib.sha1(b"my_queue").digest()).decode().lower()
'756gnpdwk3lna3zna4toqq4eh4dhu4r6'
>>> len(_)
32

(this gives 32^32 possibilities, so 10^48, meaning you'd have roughly the same probability of collision as choosing a random atom on the planet Earth and your friend also choosing a random atom, and you've both randomly picked the same atom)

And we should log the mapping between the queue name and the hash either at the start of the worker, or when we listen (level debug).

ashleyheath commented 4 months ago
>>> base64.b32encode(hashlib.sha1(b"my_queue").digest()).decode().lower()
'756gnpdwk3lna3zna4toqq4eh4dhu4r6'
>>> len(_)
32

@ewjoachim did you envisage doing the hash from the python code then (as opposed to from within the SQL function)? And if so, adding a new column to procrastinate_jobs to hold the hashed queue name?

ewjoachim commented 4 months ago

Ah I was thinking about the listen part where it's equal to do it in Python and in SQL. Yeah, of course, we need an SQL implementation of that for the notify part, so yeah, let's do that in SQL.

ashleyheath commented 4 months ago

Doing a little digging, it seems the digest function in postgres is actually part of the separate pgcrypto library.

So the options seem to be:

I don't have strong preferences but can imagine that some users may perfer the second due to restrictions on what they can install. @ewjoachim any thoughts?

ewjoachim commented 4 months ago

Looks like the hashtext function is available without the need for extensions. It's apparently not guaranteed to be stable with PG versions, but we don't need it to be extremely stable, nothing will break if listen/notify doesn't work during a PG migration.