rebus-org / Rebus.OpenTelemetry

Other
12 stars 3 forks source link

Resuming full trace/activity after messages are picked again for processing #20

Closed AYss closed 5 months ago

AYss commented 5 months ago

Hi.

We have a following situation. A service is waiting for several types of messages and storing them in a DB. First it was checking after processing each message if we have eventhing we need for further processing and if so, publishing subsequent messages. Since the service may process like 100 msg/s, that put heavy load on the DB, so we had to tune it up. Now there is a background service that periodically checks if all required messages arrived and publishes messages. However we lost trace/activity info so the subsequent logs are not bound by trace.id in Kibana.

I have tried keeping original messages rbs-ot-* headers and putting them into messages published by the background service, but it does not seem to work. Any ideas how to keep traces in such scenario?

We are using Rebus.OpenTelemetry package and it might related more to that, than Rebus itself, so move the issue there if you think it better suits there.

zlepper commented 5 months ago

I'm assuming you are using Rebus to still publish the "continuations" here.

Basically you need to "resume" the trace before you send the message over with Rebus (Or any other thing for example http). Basically something like this https://github.com/rebus-org/Rebus.OpenTelemetry/blob/master/Rebus.Diagnostics/Diagnostics/Incoming/IncomingDiagnosticsStep.cs#L63-L72 .

Before you save your messages in your database you need to grab the trace/baggage header so you can use them when resuming the trace: https://github.com/rebus-org/Rebus.OpenTelemetry/blob/master/Rebus.Diagnostics/Diagnostics/Outgoing/OutgoingDiagnosticsStep.cs#L85-L100

You generally shouldn't be dealing with headers directly here unless you are working directly with the underlying transport and skipping Rebus entirely.


Side note: Consider having a look at Rebus sagas, they exists for the usecase you describe https://github.com/rebus-org/Rebus/wiki/Coordinating-stuff-that-happens-over-time :)

AYss commented 5 months ago

Hi @zlepper .

Thanks for reply. Yeah, I am aware about sagas, but they have the same issue in this scenario, they are triggered by events. The amount and (worst case scenario) complexity of events are so high that we decided to do the stuff periodically and in batches, regardless of incoming events. This part is already working well and efficiently. The only issue we have right now is traces. I think I did everything you linked, but maybe I missed something. I will double check and be back with code fragments.

zlepper commented 5 months ago

Did you make sure to enable the activity source? https://github.com/rebus-org/Rebus.OpenTelemetry/blob/master/Rebus.OpenTelemetry/Configuration/TraceBuilderExtensions.cs#L14 That is usually the thing i forgot when i have to make custom resumes for this kinda thing 😅

AYss commented 5 months ago

@zlepper Yeah, well... What can I say. That was it :). Thanks for help. Closing the issue.