Open mateuszrzeszutek opened 3 years ago
Good timing: we discussed this in the Ruby SIG meeting earlier this week. Background job queues are common in Ruby web applications, with the dominant implementations being Resque and Sidekiq. So far, we've instrumented those systems using the messaging semantic conventions, but they're really not a great fit for background/batch jobs. See https://github.com/open-telemetry/opentelemetry-ruby/pull/547#issuecomment-758889459
Wouldn't this be a case where you just use an INTERNAL span without any attributes? Not every span needs to conform to a semantic convention. What information do you want to have on batch job spans?
the concepts of job/step/chunk
^ this bit seems useful.
For Ruby batch job systems, relative to message systems:
MyJob enqueue
vs default send
)... enqueue
suffix is more in keeping with the domain language than ... send
destination_kind
is likely always queue
, so probably isn't an interesting thing to specify (and certainly shouldn't be required).For the "job class name": There are semantic conventions for code locations, see https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/span-general.md#source-code-attributes.
For the "job class name": There are semantic conventions for code locations, see https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/span-general.md#source-code-attributes.
That's useful, but doesn't meet the expectations of users re: span names.
Good timing: we discussed this in the Ruby SIG meeting earlier this week.
Nice! I'll take a look at sidekiq & your instrumentation and try to extract common parts - at first it looks like the job span would be the only common thing, but maybe there's more. And there's the whole queue logic that's not there in Spring Batch/JSR-352.
Wouldn't this be a case where you just use an INTERNAL span without any attributes? Not every span needs to conform to a semantic convention. What information do you want to have on batch job spans?
True, JSR-352 does not expose much information that we could store as attributes, but that does not mean that there are zero of them: there's exit status of a job/step (arbitrary string), job/step execution id (jid
in sidekiq?). And the at least for the JSR-352/Spring Batch job steps expose several metrics (read count, write count, ...) that could be used.
And probably the most important piece of information is the span name.
This is what I had in mind for my use case (Spring Batch):
Start Job <batch.job.name>
Attributes:
batch.job.name
: the name of the job (Spring Batch), or the job class name (resque/sidekiq), or the task name (celery);batch.job.id
: the job execution id (Spring Batch), job['jid']
in case of sidekiq.Job <batch.job.name>
Attributes:
batch.job.name
;batch.job.id
;batch.job.exit_status
: a plain string containing the exit status of a job. JSR-352/Spring Batch jobs (and steps) can return an arbitrary user-defined string as the exit status (e.g. error message saying why the job has failed). Not sure how this translates to Ruby/Python frameworks.Job <batch.job.name>.<batch.step.name>
Attributes:
batch.step.name
: the name of the step;batch.step.id
: the step execution id;batch.step.exit_status
: a plain string containing the exit status of a step. The batch job may take action depending on the result of a step, e.g. send an email and stop further processing in case of failure.@fbogsany I believe that the first two spans that I've briefly described here match your use case with both Ruby libs that you've mentioned. I'm not sure about the other three, sidekiq/resque do not seem to have this sort of rigid job structure that spring batch has.
I'm not sure about the other three, sidekiq/resque do not seem to have this sort of rigid job structure that spring batch has.
They don't have the Step span, but at Shopify, for example, we have a higher-level job execution framework that provides an equivalent of "chunks", so the Chunk span is relevant there. I'm not sure about the Item read, process and write spans.
Yeah, doesn't look like Hangfire (.net) and Bree (Node.js) they same level structure as Spring Batch (never used this) and looks more simplified
[0] https://docs.hangfire.io/en/latest/background-processing/processing-background-jobs.html [1] https://jobscheduler.net/
What are you trying to achieve?
I want to introduce some semantic conventions for batch jobs, since there are currently no conventions around that. I'm mostly interested in instrumenting Spring Batch applications. There already is a Java JSR-352 spec that describes a batch job API, I was thinking of basing the trace spec on that - the concepts of job/step/chunk seem generic and language-agnostic enough (and there doesn't seem to be any other batch job specification). Before diving into details, is there a place for this in the trace semantic conventions?
Additional context.