Customize activity failure logging (and more?)

cretz commented 7 months ago

Describe the solution you'd like

Today SDKs log all activity failures, but some activity failures are intentional (e.g. for the activity-retry-based-polling pattern). So users want a way to not log some of these failures while still logging others. Also, some users want to customize the tags of logs or do other logging approaches. Users can do all of this today, but we also log in a way they can't disable.

Originally this issue suggested a full activity reporting handler, but now it seems wisest to allow users to disable our currently-forced implicit logging of activity failures so they can defer to their own logging code to do whatever they want.

Per-SDK Tickets

[ ] Go -
[ ] Java -
[x] ~Core~ - N/A
[ ] TypeScript -
[ ] Python -
[ ] .NET -
[ ] PHP -
[x] ~Temporal CLI~ - N/A

drewhoskins-temporal commented 7 months ago

I like that this doesn't require a user to implement a custom error handler for a common case, but I think it can be much more generalized than that.

If the user wants to treat an exception as benign, it doesn't solve the complete problem they have, only a sliver of it.
This is because "Don't log" is a low-level prescriptive solution rather than descriptive. It doesn't give us any power to do anything with the bit other than what it says.
- we can't decide how it affects exponential backoff.
- We can't (as you say) make it affect metrics.
- We can't make it show up differently in Temporal UI to aid in transparency and debuggability. The result is that the user will have to go make a bunch of changes to solve their full problem. And some problems, they won't be able to solve.

How about an approach where the user instead declares the severity of the error when declare it or when they throw it? That would allow us to solve the complete user problem and also minimize the times when they need to build a custom handler by expanding the use cases to not just ignoring logging but changing the log level.

cretz commented 7 months ago

If the user wants to treat an exception as benign, it doesn't solve the complete problem they have, only a sliver of it.

Exception management is part of original code or interceptors, not part of reporting. Benign is too vague of a term IMO to apply generally (it means something different to every person/system, e.g. how it affects logging, metrics, actual failure, UI, etc all have different meanings for different people).

This is because "Don't log" is a low-level prescriptive solution rather than descriptive. It doesn't give us any power to do anything with the bit other than what it says.

This solution isn't "don't log", it's "intercept logging". The power to do everything else one may need exists or can be added at error time. It's this one place where we didn't expose power.

we can't decide how it affects exponential backoff.

Because it's not supported today. But if it was, that would already be supported with code/interceptors. The only thing the code/interceptor approach doesn't support is customizing reporting. It supports everything else and we should not make another abstraction that ignores that this abstraction exists already.

We can't (as you say) make it affect metrics.

We don't want to. Our approach in SDKs is to have SDK metrics have a clear unambiguous definition and encourage users to create their own metrics if they have an alternative definition. We provide the ability for users to easily create their own metrics today because of this. This is already supported today and used by many.

We can't make it show up differently in Temporal UI to aid in transparency and debuggability.

Would need more detail to understand why this wouldn't be like any other option you can set on a failure.

How about an approach where the user instead declares the severity of the error when declare it or when they throw it?

This is the least flexible of the options. Today they can intercept any error and change anything about it. We need to continue to allow that. If they consider a value of "severity" important, this can be added in details. But if people want more logic to affect logging, setting severity doesn't help them. People often want to do things like customize log tags and such. Thinking about log level alone is too myopic compared to the flexibility given by intercepting the log itself.

An alternative approach is to allow users to disable implicit logging and tell them if they want custom activity logging, they need to disable implicit and do whatever they want in interceptors (including logging). I think that may be a better approach.

drewhoskins-temporal commented 7 months ago

Today they can intercept any error and change anything about it. We need to continue to allow that.

Stepping back. This is not an xor. And I'm definitely not arguing against providing interceptor-level customizability, I'm just arguing against treating it as a full solution.

In my current mental model, our users have two personas (a) Application developers and (b) platform developers. The former will only paint within the lines and prioritize ease of use. The latter value flexibility. Sometimes solutions can help both groups, but other times, you need different solutions for each cohort. Ideally, idiomatic things are built using lower level primitives that we also expose to users.

Due to the difficulty (last I checked) of building interceptors in the way you describe, they are best targeted at platform devs. Making Temporal easier to use means building idioms for app devs that dramatically reduce the need for advanced features. (which by the way also benefits platform devs by delaying the point at which they need more customization.)

My initial impression of your solution of having a "don't log" bit on the exception is that it's a compromise between app devs and platform devs that won't quite please anybody. Platform devs might want to customize metrics or logging, but this bit won't help them with that. App devs still need to do other things like skipping metrics to solve their problem.

In contrast, declaring an exception severity (or if you don't like that specific idea, more descriptive info) might help platform devs write interceptors because they can use it to guide decisions on both metrics and logging. A prescriptive thing like "don't log" is very narrow in its use case.

cretz commented 7 months ago

Ideally we can avoid multiple ways to do the same thing. For this issue specifically, can we just disable implicit logging so users can do as they wish in interceptors?

Quinn-With-Two-Ns commented 2 weeks ago

Users may also want to be able to customize error logs from Workflow or Nexus tasks as well.

temporalio / features

Customize activity failure logging (and more?) #463

Describe the solution you'd like

Per-SDK Tickets