peter-wangxu / persist-queue

A thread-safe disk based persistent queue in Python
BSD 3-Clause "New" or "Revised" License
335 stars 50 forks source link

Feature Request: Add separate metadata block from ID #212

Open andrewvaughan opened 8 months ago

andrewvaughan commented 8 months ago

There are scenarios where I would like to be able to get or update an element in the queue, but don't necessarily have the ID. I do, however, have a unique string or key from the item that I can use to look it up. It would be nice to have an additional set of metadata serialized with the queued item so I can lookup and modify it by this unique string without having to know the entire configuration of metadata.

My current example uses URLs as the items being queued; however, I also have a number of different information sets that configure how the processors will use that URL along with it that need to be queued.

At the moment, I include these in a tuple and enqueue it. This means, however, that I would need to know both the URL and all of the metadata to properly find it again - which isn't necessarily feasible.

Ideally, I'd like to be able to use get and update with a primary field, and have a data block attached to it that I can use.

Example

from persistqueue import SQLiteAckQueue

queue :SQliteAckQueue = SQLiteAckQueue("./my-queue.db", auto_commit=True, multithreading=True)

queue.put("https://www.youtube.com", metadata={
    "type": "media",
    "processor_class": StreamProcessor,
})

# Add a `metadata` option to return a tuple, maybe, for backwards compatibility
item, gotten_metadata = queue.get(metadata=True)

# Nack with just the item, not requiring the entire metadata set like is required now
queue.nack(item)

# Update the metadata with the item (basically, a primary key)
gotten_metadata["type"] = "foobar"
queue.update(item, metadata=gotten_metadata)

It's possible this is already feasible... but if so, the documentation is unclear how this would be done.

In my usecase - a user may want to modify data within the metadata, but only know the URL. If I understand how persistqueue works, currently, they would have to know the entire (properly serialized) object to use get in order to modify it, which they don't have the ability to do.