msolli / proletarian

A durable job queuing and worker system for Clojure backed by PostgreSQL.
MIT License
161 stars 7 forks source link

Passing job-id to worker handler-fn #16

Open tangrammer opened 1 year ago

tangrammer commented 1 year ago

Hello!

is there any reason of not passing job-id besides job-type to the worker handler-fn? I think it could be useful to identify the job execution context with this uuid (thinking for example in monitoring)

Are you interested on this change so i could try to tackle it in a PR? i suspect that this change should need a change in the create-queue-worker API :thinking: ... but as far as it's data oriented we could add something like handler-fn-v2 alike

what do you think?

thanks in advance for this great library!

msolli commented 1 year ago

Hi, and thanks for your interest in Proletarian!

I've gone back and forth in my mind about exposing job-id to the job handling function. In the end I've decided that it's an implementation detail that should not leak out to consumers. I can see that I'm not consistent here, as enqueue! returns the job-id as it stands. You should think about that value more as a diagnostic value, not to be used in application code.

However, your use case remains valid. I'd suggest using the payload to store metadata (not in the Clojure sense) about your jobs. We're doing this at work to propagate OpenTelemetry tracing data (using https://github.com/steffan-westcott/clj-otel). Here's more or less how we do it, using a pair of wrapper functions that read and write from the payload, but in a way that is transparent to the rest of the application code:

(ns my-app.infrastructure.worker-jobs
  (:require
    [proletarian.job :as job]
    [proletarian.protocols]
    [steffan-westcott.clj-otel.api.trace.span :as otel.span]
    [steffan-westcott.clj-otel.context :as otel.context]))

(defn with-trace-data
  [payload]
  (let [trace-context (otel.context/->headers)]
    (cond-> payload
      (seq trace-context) (assoc ::trace-context trace-context))))

(defn enqueue!
  [conn queue job-type payload]
  (assert (map? payload) "Worker job payload must be a map")
  (otel.span/with-span! {:name       (str `enqueue!)
                         :span-kind  :producer
                         :attributes {:queue    queue
                                      :job-type job-type
                                      :payload  payload}}
    (job/enqueue! conn job-type (with-trace-data payload) {:proletarian/queue queue})))

(defmulti handle-job!
  (fn [job-type _payload] job-type))

(defn handle-job-wrapper!
  [job-type payload]
  (let [trace-context (some-> (::trace-context payload) (otel.context/headers->merged-context))
        payload       (dissoc payload ::trace-context)]
    (otel.span/with-span! {:name       (str job-type)
                           :parent     trace-context
                           :attributes {:payload payload}
                           :span-kind  :consumer}
      (handle-job! job-type payload))))

handle-job-wrapper! would be the function that you pass to create-queue-worker. And you'd use enqueue! in your application code instead of using proletarian.job/enqueue! directly.

How would this approach work for you?

tangrammer commented 1 year ago

Hi Martin! and sorry for the big delay on your detailed answer :grimacing:

after thinking a lot about this sentece

In the end I've decided that it's an implementation detail that should not leak out to consumers.

I really don't get the "deep" reasons of not having something so useful as an (U)ID ... I fully respect that you've decided it but can't imagine why you consider an uuid as detail implementation instead of just identity data

Thinking aloud now :sweat_smile: ... couldn't be considered Identity something universal?

thanks again!

msolli commented 1 year ago

I think in the end I don't want to prematurely commit to a signature for the handler-fn that has too much information in it. I haven't seen any use for the id in the arguments to the handler-fn, but I could be wrong. What would you use it for?

dharrigan commented 1 day ago

Hi,

I too would like to see the uuid being made available for the worker to work with.

Here's a use case.

For auditing and accounting purposes, we would like to tie a job to a particular event (that is being generated and passed in via AWS SQS).

For the moment, I'm hijacking what is marked as "for testing" purposes:

* `:proletarian/uuid-fn` – a function for generating UUIDs. Used in testing. The default is [[java.util.UUID/randomUUID]].

Here is what I'm doing:

(defn use-event-id-as-job-id
  [{{:keys [id]} :target :as event}]
  (fn []
      (or (parse-uuid id) (random-uuid))))

(defn ^:private enqueue-event
  [event {:keys [a-database-somewhere proletarian] :as config}]
  (let [proletarian-opts (assoc proletarian :proletarian/uuid-fn (use-event-id-as-job-id event))]
  ...
  ...
  (job/enqueue! tx ::job-handler/do-some-work {:event event} proletarian-opts))))

Doing it this way, gives me control over the uuid that proletarian uses for it's job-id.

Perhaps 'tho, would be better to expose the uuid to the worker rather than using a "for testing" purposes configuration! :-)

-=david=-

msolli commented 1 day ago

First, regarding your current implementation, are you 100% sure your SQS event ids are unique always? How about re-deliveries, will the event-id change? Consider that the job-id is the primary key in the database.

Can you tell me more about your auditing and accounting purposes?

msolli commented 1 day ago

None of the use cases I've seen so far really convince me of providing the job-id to the handler function. I think it can be handled in application code with job metadata and a wrapper.

But still, at this point I'm willing to look into how it could be done. There are major concerns with backwards compatibility, though. The question is, where should the job-id go when invoking the handler-fn?

The docstring says:

`handler-fn` –  the function that will be called when a job is pulled off the queue. It should be an arity-2
function or multimethod. The first argument is the job type (as provided to [[proletarian.job/enqueue!]]). The
second argument is the job's payload (again, as provided to [[proletarian.job/enqueue!]]).

Alternative 1: Add a parameter to handler-fn

If we add a parameter with the job-id (either as the value itself, or as part of a metadata map with more information about the job), then existing code will break.

Alternative 2: Change the first parameter

It is currently the job type as a keyword. It could be changed to a map with more metadata about the job, including job-id:

{:proletarian/job-type :my-app/foo
 :proletarian/job-id #uuid "…"
 …
}

This, too, would be a breaking change.

Alternative 3: Change the second parameter

The payload is the second parameter. A simple non-breaking change would be to introduce the job-id (and maybe other job metadata) as extra keyvals in the payload:

{:my-job-data-1 "foo"
 :my-job-data-2 :bar
 :proletarian.job/job-id #uuid "…"
 ...
}

But it's not so simple as it turns out. There is actually no requirement in Proletarian that the payload be a map. It only needs to be serializable by the provided serializer. It could just as well be a scalar value, or a vector, or something else. Then this approach doesn't work. Or it could work if we checked that the payload is a map and only assoc-ed in the metadata in that case.

Config options

The breaking changes discussed could be introduced with config options that enable the previous behavior, with a deprecation notice.

API design is hard

It really boils down to a tradeoff between ergonomics and flexibility. So far I've avoided exposing the mechanical details of the job infrastructure to consuming code. I think the interface is very clean as it is (with the two-arity handler-fn, I mean). It comes at the cost of less flexibility - you just don't get that much information about the jobs, which means you can't take action based on information about the inner workings of the job infrastructure. I think that is a good thing, but I'm open to be proven wrong by compelling use cases.

There is cost to providing this information, too. In the vast majority of cases consumers do not and should not care about these infrastructure concerns. Application code can concern itself with domain logic, and wrapper functions (around handler-fn and enqueue!) can read and write metadata (in the payload) for the non-domain concerns (like tracing, see example above).

As this library is still in alpha, we still can make some of these changes, but it should not be taken lightly.

Any thoughts, @tangrammer or @dharrigan?

dharrigan commented 1 day ago

Hi,

Very insightful and API design is very hard! I don't envy you! :-)

I'll have a ponder, but out of everything I've read so far, it seems like option 3 is the safest - adding the :proletarian/job-id as a key to the payload if it's a map. If someone is using a vector, or anything else, then that may be a known limitation (they could change to pass in a map, with the vector/scalar as one of the keys).

-=david=-