msolli / proletarian

A durable job queuing and worker system for Clojure backed by PostgreSQL.
MIT License
161 stars 7 forks source link

Instrumenting jobs #20

Open areina opened 12 months ago

areina commented 12 months ago

Hello @msolli, thanks for your work on this library.

Did you think about adding monitoring/instrumentation to the jobs? Although I understand it might be possible to wrap the handle-job! multimethod with some logic to produce metrics, it would be great to get that behavior by default. What's your take on this?

msolli commented 12 months ago

Hi @areina!

I might be inclined to something with OpenTelemetry, but so far I've covered my use cases by wrapping the enqueue! and handler-fn! functions.

Do you have any thoughts on what tracing and metrics behaviors should be included by default? I see that you're with NewRelic, which means you probably have more experience in this area than me.

Here's what we're doing at work to instrument the workers:

;; Enqueue

(defn with-trace-data
  [payload]
  (let [trace-context (clj-otel.context/->headers)]
    (cond-> payload
      (seq trace-context) (assoc ::trace-context trace-context))))

(defn enqueue!
  ([conn queue job-type payload]
   (assert (map? payload) "Worker job payload must be a map")
   (trace.span/with-span! {:name       (str `enqueue!)
                           :span-kind  :producer
                           :attributes {:queue    queue
                                        :job-type job-type
                                        :payload  payload}}
     (let [job-id (job/enqueue! conn job-type (with-trace-data payload) {:proletarian/queue      queue
                                                                         :proletarian/serializer serializer})]
       (trace.span/add-span-data! {:attributes {:job-id job-id}})
       (log/info (str `enqueue! " " job-type) {:queue queue :job-type job-type :payload payload :job-id job-id})
       job-id))))

;; Handle job

(defmulti handle-job! (fn [job-type _payload] job-type))

(defn handle-job-wrapper!
  [job-type payload]
  (let [trace-context (some-> (::trace-context payload) (clj-otel.context/headers->merged-context))
        payload       (dissoc payload ::trace-context)]
    (trace.span/with-span! {:name       (str job-type)
                            :parent     trace-context
                            :attributes {:payload payload}
                            :span-kind  :consumer}
      (log/info (str `handle-job! " " job-type) {:job-type job-type, :payload payload})
      (handle-job! job-type payload))))

Here I'm using the https://github.com/steffan-westcott/clj-otel library. This produces spans like this: Skjermbilde 2023-10-23 kl  09 15 27

(Yes, we're using NewRelic! :) )

areina commented 11 months ago

Going with OpenTelemetry makes total sense to me. At this moment in a side project I was just reporting these four metrics:

I took a look at what other similar projects were doing and I found this for sidekiq (ruby): https://github.com/fastly/sidekiq-prometheus

I thought it could be cool to have this behavior by default, or something that you can enable when you configure the worker. However, as a first step could be great just to add your examples to the documentation.

Thanks!