temporalio / features

Behavior and history compatibility testing for Temporal SDKs
13 stars 17 forks source link

[Feature Request] Eager activity dispatch #55

Closed cretz closed 1 year ago

cretz commented 2 years ago

Overview

SDKs need to support "eager activity dispatch" by default which is essentially just setting ScheduleActivityTaskCommandAttributes.request_eager_execution to true if there is "room". Then if supported on the server, the server will give the work right back to the worker requesting.

Unbalanced Worker Concern

This feature is purely for server benefit, and is actually possibly harmful to SDK and users. It can cause unbalancing of workers. For example, say you have 10 workers with 1000 activity slots each and just one workflow is running. If that workflow starts 300 activities, those are all gonna be on the one worker and the other 9 will idle. Today the server does not have a load balancing policy and therefore which worker work is given to is random, but even that may distribute load better than same-worker-as-workflow-by-default.

Features

Issues

bergundy commented 2 years ago

Should we have way in activity options to disallow eager dispatch?

https://github.com/temporalio/sdk-core/pull/296/files#diff-df2313a0cc3597f2135b2972c7f8ca12c4377fce2d1f6b97a41b0aff18e9fb50R83

Make sure this can be set to "off" (or use a separate config boolean)

This would be setting the max eager activities to 0

Research - what is the default? Unset which means max activities of the worker?

Good question, I'd consider adding a rate limiter for eager activities to make sure we don't overload a single worker, it might be more useful than concurrency limiter.

Research - can we reliably determine when to add a eager-activity slot back to the pool

Once a worker accepts these activities the rest of the flow is exactly the same as with non-eager activities. We essentially accept a single activity attempt (task), retries and timeouts are handled by the server.

Unbalanced Worker Concern

This is a valid concern but as workload grows it become increasingly insignificant AFAICT.

yiminc commented 2 years ago

This feature is purely for server benefit

This is not true. User get benefit of much faster dispatch and less overhead.

I think we should just start conservatively, only allow small number of eager execution. Small number like 2 or 3 should cover the majority of the cases and thus harvest most of the benefit we want.

yiminc commented 2 years ago

Should we have way in activity options to disallow eager dispatch?

I don't think we need this on SDK. Server side has control to disable this feature per namespace, if we ever need to do so.

cretz commented 2 years ago

This is not true. User get benefit of much faster dispatch and less overhead.

Sorry, I wasn't clear/accurate when I said only for server benefit. I mean this optimization is only exposed to users so the server can provide a performance benefit. The server is a black box to users and ideally we wouldn't have to bother SDK or users with a new concept to improve performance. But if course I understand why.

I don't think we need this on SDK. Server side has control to disable this feature per namespace, if we ever need to do so.

We already are going to have max eager worker option, so technically 0 is disabling. The question here is whether to also support disabling at the activity options level inside the workflow. Is there a use case where a workflow dev may know they don't want to eagerly dispatch a certain type of activity? Is this similar to the decision we allow them to make about choosing whether an activity is locally dispatched?

josh-berry commented 1 year ago

I see all the langs are checked off as being done—can this be closed?

mjameswh commented 1 year ago

There's some outstanding discussion regarding interaction of Eager Activity Execution + Workflow Versioning, but I strongly feel like this is outside of the scope of this ticket.

josh-berry commented 1 year ago

Great. Closing.