microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

Correctness of claim about MSSQL storage provider in disconnected environments #112

Closed neil-mccrossin-cmd closed 2 years ago

neil-mccrossin-cmd commented 2 years ago

According to its documentation, the MSSQL storage provider "supports disconnected environments - no Azure connectivity is required when using a SQL Server installation." https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-storage-providers

And again: "Does this require Azure? No. You can run on Azure if you want, but this provider was designed specifically to support running DTFx and Durable Functions in a non-Azure environment." https://microsoft.github.io/durabletask-mssql

This claim is certainly true in terms of the specific storage requirements (i.e. Task Hubs) that Durable Functions have above and beyond 'ordinary' Azure Functions. However my understanding (correct me if I'm wrong) is that Durable Functions run on top of a technology stack that includes the implementation of 'ordinary' Azure Functions, which in turn is implemented using the WebJobs SDK. These underlying layers make their own use of Azure Storage (in particular blob storage), and this usage is in no way related to Task Hubs and thus not catered for by the MSSQL provider (this 'residual' use of blob storage can be easily observed when running Durable Functions using the MSSQL provider).

Thus in actual practical terms, it seems that the claim of being able to run Durable Functions without any connection to Azure is not correct as the underlying layers still require it.

Some other things I have tried:

Please advise whether the claim in the documentation quoted at the top is incorrect, or whether I have misunderstood in some way, in which case please provide clarification.

olitomlinson commented 2 years ago

@neil-mccrossin-cmd

You absolutely can use the Azure Functions programming model (including Durable Functions) but packaged in a container, run that container any where you like, so AFAIK there is no hard-dependency on Azure Storage, or Azure itself.

However, when using Azure Functions programming model outside of Azure, you lose the benefit of the horizontal scaling that comes with the various kinds of Azure App Service Plans and Dynamic/Serverless plans.

So if scaling your workload is important to you then yes you made need to invest in other tools. I.e. if you are intending to operate inside a kubernetes environment then you may want to look at Keda - https://keda.sh/docs/2.2/scalers/mssql/

cgillum commented 2 years ago

Just to add, the Azure Functions runtime and hosting team did work to ensure that Azure Functions can run without requiring any Azure dependencies, including Azure Storage (or Azure Storage emulators). Secrets is one example of where new abstractions were created to decouple Azure Functions from Azure Storage (for example, allowing you to instead use local files or Kubernetes secrets). We have customer scenarios that require exactly this - support for disconnected environments - and the work required to enable this (plus validation) has been done.

neil-mccrossin-cmd commented 2 years ago

@cgillum @olitomlinson Ok after further investigation I can see now that there is no dependency, but that is not how it appears to someone attempting to verify that fact who does not have internal knowledge of the technology.

I started with a Durable Functions hello world application depending on Azure Storage. I then attempted to completely remove that dependency to verify that this was possible. I cleared the storage account, switched to the SQL Server Durable Hub provider, and noted that fewer items were now being written to the storage account. So far so good. I cleared the storage account again, switched to using files for secrets, and again noted that fewer items were being written. Also a step in the right direction. But files were still being written to the Azure storage account. They were zero byte lock files, but the storage account was still being used for these files, despite me seemingly having taken all required measures to remove the dependency on Azure storage.

That is why I raised this issue.

Subsequently, I thought of the idea of just completely removing the AzureWebJobsStorage setting to see what would happen. In fact the application ran fine without it. I then poked around in the WebJobs source code and as best I can judge the lock files are only for the management of the storage account itself, so once the reference to the storage account is removed they are no longer needed (please verify if my understanding of the source code is correct).

So in summary: It is not possible to remove all usage of Azure Storage, if the application knows about a storage account. This is readily observable. A critical nuance however is that this observed usage is not a dependency, i.e. if you prevent the application from accessing the storage account, there is no adverse effect on the application. This is probably obvious to someone who knows the technology from the inside out, but should be explicitly mentioned in the documentation for those who do not.

cgillum commented 2 years ago

I then poked around in the WebJobs source code and as best I can judge the lock files are only for the management of the storage account itself, so once the reference to the storage account is removed they are no longer needed (please verify if my understanding of the source code is correct).

I'm not a maintainer of the Azure Functions runtime so I can't speak to the details of how it's implemented, but it is my understanding that there are various internal administrative things that the host will do with a storage account, if configured. Managing locks is indeed one of those things. And yes, removing the AzureWebJobsStorage is critically important to disable the use of Azure Storage behind the scenes.

This is probably obvious to someone who knows the technology from the inside out, but should be explicitly mentioned in the documentation for those who do not.

I understand where you're coming from in terms of how it's not obvious why a storage account isn't required. The main challenged faced by Azure Functions is that it was originally designed with a hard dependency on Azure Storage, and this dependency was lifted only recently to support customers that wanted to use Functions in fully offline scenarios.

It's not clear to me, however, how we should update the documentation or what exactly needs to be clarified (and for what purpose). It's critically important that we clearly state that Azure is not required. Non-technical folks with otherwise have a very unclear picture of whether this technology can be used in their disconnected environments. Let me know if you have a specific recommendation on how we can change the existing language to make things more clear.