microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

Clarify supported host.json settings #44

Open marcd123 opened 3 years ago

marcd123 commented 3 years ago

Can someone please clarify what Azure Function host.json settings are supported when using the MSSQL Durable Task Extension for Azure Functions?

I'm particularly interested in these settings:

- controlQueueBatchSize
- controlQueueBufferThreshold
- partitionCount
- controlQueueVisibilityTimeOut
- workItemQueueVisibilityTimeout
- maxConcurrentActivityFunctions
- maxConcurrentOrchestratorFunctions
- maxQueuePollingInterval

I'm also unfamiliar with taskEventLockTimeout which appears in the Durable Task SQL guide website but isn't explained:

https://microsoft.github.io/durabletask-mssql/#/README

This explanation of each setting has been very helpful on previous Azure Function apps that I've worked on, though those were not using AKS, KEDA, or MSSQL Durable Task Extension:

https://github.com/MicrosoftDocs/azure-docs/blob/master/includes/functions-host-json-durabletask.md

cgillum commented 3 years ago

Hi @marcd123, you can find the list of supported settings here, though this location doesn't currently have any useful documentation.

As far as the settings you highlighted, only maxConcurrentActivityFunctions and maxConcurrentOrchestratorFunctions are relevant to the MSSQL backend. All the others apply only to the Azure Storage backend.

Just to quickly call out the meanings of the available settings:

I'll use this issue to track ensuring that we have a proper reference section where the meanings of these settings are clearly documented.

marcd123 commented 3 years ago

Thanks for the clarifications @cgillum! Had a couple more questions

  1. To confirm, partitionCount has no effect when using MSSQL provider, and ALL instances of the function app can consume orchestrator tasks?

  2. I'm excited to see taskEventBatchSize is configurable. Does this control the batch size for pulling from the orchestrator function queue only, or does it control the batch size for the activity function queue as well?

  3. I understand maxConcurrentActivityFunctions and maxConcurrentOrchestratorFunctions controls actual execution concurrency, but do they have any effect on batch size that is pulled from each respective queue? I'm also curious if these add, meaning that I can have a max of maxConcurrentActivityFunctions + maxConcurrentOrchestratorFunctions functions running at one time. Important to know as I'm very sensitive to concurrency in Python and have a ML and image processing workload that is very memory intensive.

Bonus question: Are there any plans for maxConcurrentActivityFunctions and maxConcurrentOrchestratorFunctions host.json settings to effect the KEDA ScaledObject in the deployment YAML file that is generated when using func kubernetes deploy --write-configs? If that is a better questions on the functions-core-tools repo I can post there, I have a few requests on the func CLI (particularly set pod requests/limits via CLI).

cgillum commented 3 years ago
  1. Correct, all instances of the function app can consume orchestrator tasks.
  2. The taskEventBatchSize only applies to orchestrator (and entity) functions. We still pull activity messages one-at-a-time.
  3. Concurrency settings are unrelated to batch sizes. All messages in a batch for an orchestration will be processed in a single orchestrator function execution.

Regarding your bonus question: there is intent to use the maxConcurrentActivityFunctions and maxConcurrentOrchestratorFunctions in host.json to configure the parameters used by the KEDA scaled object (they are currently hardcoded). However, no ETA on when that work will be done yet.

atdimitrov commented 2 years ago

Hello. What about settings related to extended sessions? I looked around and I'm failing to find any information apart from Support for extended sessions may vary depend on the Durable Functions storage provider you are using. found here which makes me believe it is not supported by the MSSQL storage provider.

cgillum commented 2 years ago

@atdimitrov correct, extended sessions aren't currently implemented for the MSSQL storage provider. That's also something we'll need to make more explicit in the docs.

marcd123 commented 2 years ago

I'm curious to know what is an appropriate taskEventLockTimeout setting when I have long running activity functions. In one case I have an activity function that may run for 1 minute before completing. If my taskEventLockTimeout is only 30 seconds, does that mean another instance of my function app will pick up the same activity because the first hadn't completed yet?

cgillum commented 2 years ago

@marcd123 Lock timeouts are actually completely separate from lock renewal. I think we need to do a better job of making this clear. To answer your question, taskEventLockTimeout only applies in cases where a worker locks a message and then fails. If your lock timeout is set to 30 seconds, then it will take that long before the lock expires and another worker can pick up the event and run the activity function. However, if a worker is healthy, it will automatically renew the activity message every 5-30 seconds (hardcoded in the framework), regardless of the value for taskEventlockTimeout. Because of this, you don't need to worry about how long your activity functions run - the framework will always keep renewing those messages in the background.