microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.49k stars 2.61k forks source link

Invoking Azure Function results in timeout even with callback #8655

Closed shkrishms closed 4 years ago

shkrishms commented 5 years ago

AccountName: o365exchange.visualstudio.com TeamProject: O365 Sandbox/SubstrateDevX Agent - Hosted or Private: Hosted agent 2017

I am trying to define a pipeline where as a part of agentless job, I invoke an Azure Function call. This used to work well for the first couple of days. This is a short running function and completes within 20 sec. Then for some reason, the function gave the below error:

Response Code: 0 Response: An error was encountered while processing request. Exception: Exception Message: The request timed out after 20 seconds as no response was received. (type TimeoutException)

Changing the option to "Callback" instead of "ApiResponse" also resulted in the same error.

The actual function is triggered and has completed successfully.

This is the path to the failure: https://o365exchange.visualstudio.com/O365%20Core/Substrate%20DevX/_releaseProgress?releaseId=71321&_a=release-environment-logs&environmentId=86096

bansalaseem commented 5 years ago

@shkrishms can you please re-run your release after adding a variable System.Debug with value as true in your release and see whether you get more logs?

canteboi commented 5 years ago

i am also encountering this, after a successful function app deploy using "Azure App Service Deploy" then invoking "Invoke Azure Function" to check a successful deployment.

Utility Name : Invoke Azure Function Response Code: 0 Response: An error was encountered while processing request. Exception: Exception Message: The request timed out after 20 seconds as no response was received. (type TimeoutException)

Also even if using "Invoke REST API" utility, the same error is encountered.

Utility Name : Invoke REST API System Debug : True Parsing expression: <TaskInputs[TaskInputs['connectedServiceNameSelector']]> TaskInputs [ ..TaskInputs ..[ ....'connectedServiceNameSelector' ..] ] Evaluating: TaskInputs[TaskInputs['connectedServiceNameSelector']] Evaluating indexer: ..Evaluating TaskInputs: ..=> Object ..Evaluating indexer: ....Evaluating TaskInputs: ....=> Object ....Evaluating String: ....=> 'connectedServiceNameSelector' ..=> 'connectedServiceName' => 'd9ad5ca1-ecea-4189-8053-6882b578ff4d' Result: 'd9ad5ca1-ecea-4189-8053-6882b578ff4d'

Response Code: 0 Response: An error was encountered while processing request. Exception: Exception Message: The request timed out after 20 seconds as no response was received. (type TimeoutException)

bansalaseem commented 5 years ago

@canteboi and @shkrishms, can you please share where the 20 seconds is getting spent in your scenarios so that we can see what should be done to make your flows successful?

navbhati commented 5 years ago

Hello, I am also encountering this with following error when calling a function using Invoke Azure Function task -

Response Code: 0 Response: An error was encountered while processing request. Exception: Exception Message: The request timed out after 20 seconds as no response was received. (type TimeoutException)

Invoke Azure Function task doesn't throw above error if function returns with response in less than 20 sec.

Any help resolving above will be appreciated.

vithati commented 5 years ago

@navbhati , You can use 'Callback' mode image

Refer this blog for more info.

bansalaseem commented 5 years ago

I am closing this issue as the suggestion from @vithati is the recommended option. Please feel free to reactivate if it does not work for you.

martins-vds commented 5 years ago

Hi there,

I'm having the same issue. I followed the proposed solution but the behavior hasn't changed as I am still getting the 20-second timeout

Thoughts?

bansalaseem commented 5 years ago

@martins-vds can you please share the link to a failed release and also the code that you have written inside azure function to call back?

martins-vds commented 5 years ago

Hi @bansalaseem ,

I'm actually using the azure function in a build definition to spin up an instance of a containerized build agent.

Here is the link to the failed build (I've added you to the team project): https://vinima.visualstudio.com/My%20Work/_build/results?buildId=163

Here is the code in the azure function:

if(!EnsureAuthenticated(log))
            {
                return req.CreateErrorResponse(HttpStatusCode.InternalServerError, "Failed to execute function. Check logs for more details.");
            }
            var resourceGroup = await _azure.ResourceGroups.GetByNameAsync("vsts");
            var agentName = await GetNameAsync(req, "name");

            var azureDevOpsUrl = GetHeaderValue(req, "VSTS_AGENT_INPUT_URL");
            var azureDevPatToken =  GetHeaderValue(req,"VSTS_AGENT_INPUT_TOKEN");

            log.Info("Configuring environment variables");

            var env = new Dictionary<string, string>
            {
                { "VSTS_AGENT_INPUT_URL", azureDevOpsUrl },
                { "VSTS_AGENT_INPUT_AUTH", "pat" },
                { "VSTS_AGENT_INPUT_TOKEN", azureDevPatToken },
                { "VSTS_AGENT_INPUT_POOL", "AzureContainerInstance" },
                { "VSTS_AGENT_INPUT_AGENT", agentName }
            };

            log.Info("Environment variables configured");

            log.Info("Creating container instance");

            var privateContainerRegistry = Environment.GetEnvironmentVariable("ContainerRegistryUrl");
            var userName = Environment.GetEnvironmentVariable("ContainerRegistryUsername");
            var password = Environment.GetEnvironmentVariable("ContainerRegistryPassword");

            try
            {
                var containerGroup = await _azure.ContainerGroups.Define(agentName)
                        .WithRegion(resourceGroup.RegionName)
                        .WithExistingResourceGroup(resourceGroup)
                        .WithLinux()
                        .WithPrivateImageRegistry(privateContainerRegistry, userName, password)
                        .WithoutVolume()
                        .DefineContainerInstance(agentName)
                            .WithImage("contoso.azurecr.io/build-agent:latest")
                            .WithoutPorts()
                            .WithEnvironmentVariables(env)
                            .Attach()
                        .CreateAsync();

                log.Info("Container instance created");
            }
            catch (Exception e)
            {
                log.Error($"\nFailed to create container instance:\n{e.Message}", e);
                throw;
            }

            return req.CreateResponse(HttpStatusCode.OK, "Azure DevOps agent is running");
martins-vds commented 5 years ago

I've also noticed a different behavior.

I'm calling 2 azure functions as you can see bellow and the one that fails is the first on. image

Build summary image

Azure function step log image

anton-kirschhock commented 5 years ago

I am encountering the same issue. I've created a Function which spins up a Container Instance (with a docker-buildagent) which takes about 90 seconds to complete. I've use the callback as Completion event and in my function I RaiseCompletion. 2 Questions concerning this:

Thanks in advance!

vithati commented 5 years ago

I am encountering the same issue. I've created a Function which spins up a Container Instance (with a docker-buildagent) which takes about 90 seconds to complete. I've use the callback as Completion event and in my function I RaiseCompletion. 2 Questions concerning this:

  • as I've chosen for the Callback completion event, why does it still timeout? Is this a behaviour of Agentless tasks?

Even if you select callback completion your function should return with in 20 secs. 'Callback' mode means task waits for completion event from external, and not wait till function completes. If your function has any time consuming tasks, you should handle that work to another thread and main function should return immediate, and newly created thread should complete work and raise task completed event at end. In your case container instance spin should handover to another thread. Please refer this AzureFunctionHandler example, in this example function creates another thread to do actual work and main azure function return immediately.

  • Is there any way to increase the timeout setting?

No

Thanks in advance!

vithati commented 5 years ago

Hi @bansalaseem ,

I'm actually using the azure function in a build definition to spin up an instance of a containerized build agent.

Here is the link to the failed build (I've added you to the team project): https://vinima.visualstudio.com/My%20Work/_build/results?buildId=163

Here is the code in the azure function:

if(!EnsureAuthenticated(log))
            {
                return req.CreateErrorResponse(HttpStatusCode.InternalServerError, "Failed to execute function. Check logs for more details.");
            }
            var resourceGroup = await _azure.ResourceGroups.GetByNameAsync("vsts");
            var agentName = await GetNameAsync(req, "name");

            var azureDevOpsUrl = GetHeaderValue(req, "VSTS_AGENT_INPUT_URL");
            var azureDevPatToken =  GetHeaderValue(req,"VSTS_AGENT_INPUT_TOKEN");

            log.Info("Configuring environment variables");

            var env = new Dictionary<string, string>
            {
                { "VSTS_AGENT_INPUT_URL", azureDevOpsUrl },
                { "VSTS_AGENT_INPUT_AUTH", "pat" },
                { "VSTS_AGENT_INPUT_TOKEN", azureDevPatToken },
                { "VSTS_AGENT_INPUT_POOL", "AzureContainerInstance" },
                { "VSTS_AGENT_INPUT_AGENT", agentName }
            };

            log.Info("Environment variables configured");

            log.Info("Creating container instance");

            var privateContainerRegistry = Environment.GetEnvironmentVariable("ContainerRegistryUrl");
            var userName = Environment.GetEnvironmentVariable("ContainerRegistryUsername");
            var password = Environment.GetEnvironmentVariable("ContainerRegistryPassword");

            try
            {
                var containerGroup = await _azure.ContainerGroups.Define(agentName)
                        .WithRegion(resourceGroup.RegionName)
                        .WithExistingResourceGroup(resourceGroup)
                        .WithLinux()
                        .WithPrivateImageRegistry(privateContainerRegistry, userName, password)
                        .WithoutVolume()
                        .DefineContainerInstance(agentName)
                            .WithImage("contoso.azurecr.io/build-agent:latest")
                            .WithoutPorts()
                            .WithEnvironmentVariables(env)
                            .Attach()
                        .CreateAsync();

                log.Info("Container instance created");
            }
            catch (Exception e)
            {
                log.Error($"\nFailed to create container instance:\n{e.Message}", e);
                throw;
            }

            return req.CreateResponse(HttpStatusCode.OK, "Azure DevOps agent is running");

I think "await _azure.ContainerGroups.Define ..." call taking more time. You should handover this work to another thread and return from main function. Newly created thread should complete work and raise task completed event at end. Please refer this AzureFunctionHandler example, in this example function creates another thread to do actual work and main azure function return immediately.

martins-vds commented 5 years ago

Hi @bansalaseem , I'm actually using the azure function in a build definition to spin up an instance of a containerized build agent. Here is the link to the failed build (I've added you to the team project): https://vinima.visualstudio.com/My%20Work/_build/results?buildId=163 Here is the code in the azure function:

if(!EnsureAuthenticated(log))
            {
                return req.CreateErrorResponse(HttpStatusCode.InternalServerError, "Failed to execute function. Check logs for more details.");
            }
            var resourceGroup = await _azure.ResourceGroups.GetByNameAsync("vsts");
            var agentName = await GetNameAsync(req, "name");

            var azureDevOpsUrl = GetHeaderValue(req, "VSTS_AGENT_INPUT_URL");
            var azureDevPatToken =  GetHeaderValue(req,"VSTS_AGENT_INPUT_TOKEN");

            log.Info("Configuring environment variables");

            var env = new Dictionary<string, string>
            {
                { "VSTS_AGENT_INPUT_URL", azureDevOpsUrl },
                { "VSTS_AGENT_INPUT_AUTH", "pat" },
                { "VSTS_AGENT_INPUT_TOKEN", azureDevPatToken },
                { "VSTS_AGENT_INPUT_POOL", "AzureContainerInstance" },
                { "VSTS_AGENT_INPUT_AGENT", agentName }
            };

            log.Info("Environment variables configured");

            log.Info("Creating container instance");

            var privateContainerRegistry = Environment.GetEnvironmentVariable("ContainerRegistryUrl");
            var userName = Environment.GetEnvironmentVariable("ContainerRegistryUsername");
            var password = Environment.GetEnvironmentVariable("ContainerRegistryPassword");

            try
            {
                var containerGroup = await _azure.ContainerGroups.Define(agentName)
                        .WithRegion(resourceGroup.RegionName)
                        .WithExistingResourceGroup(resourceGroup)
                        .WithLinux()
                        .WithPrivateImageRegistry(privateContainerRegistry, userName, password)
                        .WithoutVolume()
                        .DefineContainerInstance(agentName)
                            .WithImage("contoso.azurecr.io/build-agent:latest")
                            .WithoutPorts()
                            .WithEnvironmentVariables(env)
                            .Attach()
                        .CreateAsync();

                log.Info("Container instance created");
            }
            catch (Exception e)
            {
                log.Error($"\nFailed to create container instance:\n{e.Message}", e);
                throw;
            }

            return req.CreateResponse(HttpStatusCode.OK, "Azure DevOps agent is running");

I think "await _azure.ContainerGroups.Define ..." call taking more time. You should handover this work to another thread and return from main function. Newly created thread should complete work and raise task completed event at end. Please refer this AzureFunctionHandler example, in this example function creates another thread to do actual work and main azure function return immediately.

Since I'm spinning up a containerized build agent, I need to wait until the container is up and running (and therefore I have an operational build agent running on the container) before the build pipeline can continue. The reason why I tried to use "Callback" is because usually the Azure Function (with the "await _azure.ContainerGroups.Define ..." statement) takes 3-4 minutes to the complete and I assumed that task would wait for that long and wouldn't timeout.

As a workaround, I removed the "await" from the function code and added to the build definition the "Delay" task that holds the pipeline for 3-4 minutes hoping that after that time the build agent would already be up and running. The problem is that there are times when the container takes longer than 4 minutes to become operational and the build pipeline fails because there is not build agent available by the time the "Delay" task finishes.

If the intended behavior of the "Invoking Azure Function" task is to timeout after 20 seconds even when "Callback" is selected as a completion event, the documentation should reflect that. In fact, the documentation should reflect that the task will timeout after 20 seconds regardless of the completion event if that is the intended behavior.

anton-kirschhock commented 5 years ago

@vithati Thanks for the answer. Although, I am still curious why this has been designed this way? Is is just a rule or is it maybe how agentless tasks are ran? Personally I am not big fan of 'Oh lets shoot up a thread and let that thread handle it' as our devops team would love to see that the creation of a agent has failed within Azure Devops logs, without consulting a separate logging system. Okay one could say 'implement a notification system' but that again would mean maintaining that system. So hens my curiosity. I agree with @martins-vds concerning the Documentation. This is essential to people who are trying similar workflows in builds. Just small FYI, our workarround is enabling 'continue on fail' in the task, which is even less ideal than the multi threading path.

vithati commented 5 years ago

We will update the Docs. Server tasks like 'Invoke Azure function' or 'Invoke Rest Api' should respond with in 20secs. In case of "Callback" you can send initial response like requested accepted and do other activity in async mode and send task completed event once work is done. Server task will be in progress state till it receives the task completed event from external.

I have written azure function sample using ServerTaskHelper which create azure container instance. When this azure function invoked through 'Invoke Azure function' task with "Callback" mode, task stays in in-progress state till container created and moves to completed state once container created. If any errors occurs while creating container, task moves to failed state. Server not pickup another task execution until 'Invoke azure function' task execution completes.

chris31389 commented 5 years ago

@vithati Why does it have to be a rule that "it should respond within 20 secs"? Can't it be something that is configurable? There is obviously a few people who would like to extend that time limit (including myself) and I feel like i'm being restricted or told off that it takes longer than 20 seconds.

Can you explain why 20 seconds is the limit? Is there a reason we can't configure it?

My example is that I deploy my web app and I then call a /health endpoint on the deployed web app. It takes over 20 seconds to warm up, so the call will timeout.

bansalaseem commented 5 years ago

@chris31389 These tasks are run on Azure Devops server and thus we don't want to hold a thread for longer time. In case you have time taking operations which cannot be done in an async manner as @vithati mentioned above, you should use agent based task for it.

saikrishnav commented 5 years ago

@bansalaseem - I am hitting the exact same problem and I don't think this is an unreasonable ask. Azure Functions by default have a 5 minute timeout themselves and 20s is no where near that and its confusing to see this error message. I have an Azure Function that takes 12-15 sec at max, and it randomly timing out with the above error. May be its counting some additional bootstrap time and timing out. Regardless, the issue is that if the "tasks are run on azure devops server and cannot hold a thread for longer time", then the parity is lost between Azure Function and the task and probably the product itself should provide/suggest a solution by an option in the task.

Also, 20 seconds is way too less for it to be the max default for an azure function when the azure itself provides 5 min as default.

Another point - "Continue on Error" makes the pipeline Partially Succeeded and I don't want that. There should be an "Ignore Response" option. I don't understand the "Success Criteria" input and there are no good examples in docs.

jawied commented 5 years ago

I am also running into the described issue. I'm calling an azure function from an ADO gate which takes ~30 seconds but the gate times out even if I do a callback. We really need an option for timeout in ADO gates like we do in normal tasks.

vithati commented 5 years ago

As i mentioned in one of above comment, use 'callback' mode if azure funtion execution takes more than 20 secs. You can refer azure function sample which created using ServerTaskHelper. You can also use ServerTaskHelper in your azure function and implement callback mode. When Azure function sample invoked through 'Invoke Azure function' task with "Callback" mode, task stays in in-progress state untill container created (~5mins) and moves to completed state once container created. If any errors occurs while creating container, task moves to failed state. Server not pickup another task execution until 'Invoke azure function' task execution completes.

'Success Criteria' which defines when to pass the task. No criteria means response content does not influence the result. You can parse output of azure function respone and define when to pass the task. Example:- If azure funtion response is {\"status\" : \"successful\"}, then success criteria expression eq(root['status'], 'successful') pass the task, if response is {\"status\" : \"abort\"} then the same expression fails the task, refer this for More information​". You can also refer AzureMonitorV0 , AzureMonitorV1 and QueryWorkItemsV0 tasks suceess criteria.

JonathanBaileyTC commented 5 years ago

I'm having a similar issue.

I'm using the Invoke Azure Function task to invoke my Azure function (written in Python). The Azure Function takes a few minutes to run, which is does successfully. However, the agentless job times out after 20 sec., even if I changes the timeout in the Advanced section to 5 minutes.

Also, agree with other comments that it seems pretty hacky to have to build this callback mechanism outside of the DevOps environment just because my function runs longer than an arbitrary 20 seconds. This is particularly true if migrating existing code, say, a Python script, that wasn't originally intended to be used in a web context, so was never intended to return a response in a short time window. Now I'd need to rework it to be used inside of a web paradigm, instead of just letting me increase the timeout to a reasonable time window that I'm sure my script will execute inside of.

Pwd9000-ML commented 4 years ago

Hello everybody. I am another person on the list here who wants to see an improvement here. Since PowerShell core is now in GA for function apps I am utilising this opportunity to use function apps to perform certain tasks on infrastructure on the azure platform.

Stop / start of VMs for example, build agents, dockerized build agent containers and so forth.

My function app works perfectly as I’m staying well inside of the 5minute time to keep the function active and the function apps works really well for this.

But limiting the DevOps task to 20seconds? I cannot see how anyone is even able to use this in CI/CD. Doesn’t make sense to do threading because why should you, there is an allowed time span of 5minutes natively in the function app.

Please can this be made a configurable parameter. Or at least extend to the same time limit as what is native in the function app consumption plan limitations.

Pwd9000-ML commented 4 years ago

@jwisener Can you clarify what you mean? Are you invoking the function app using PowerShell using invoke-restmethod?

Pwd9000-ML commented 4 years ago

@jwisener That’s a good method to get the app warmed up but I don’t think this is a suitable solution for a serverless Azure DevOps task. The task you’re running is on an already running agent. I think my use case is similar but I’m needing to warm up the function from an Azure severless job to actually start the build agent I’m running other builds on. Unless there’s a method/task available to invoke PowerShell code from a serverless DevOps agent as opposed to a powershell task on a self hosted agent that would be ideal.

JonathanBaileyTC commented 4 years ago

So here's something that I thought should work, using Callback as the completion event, and using this documentation as a guide

import logging
import time
import threading

import azure.functions as func

def main(req: func.HttpRequest) -> func.HttpResponse:

    t = threading.Thread(target=do_work, args=(req,))
    t.start()

    return func.HttpResponse("Succeeded", status_code=200)

def do_work(req: func.HttpRequest):

    logging.info ("Starting asynchronous worker task")

    #time.sleep(21)

    try:
        planUrl = req.headers["PlanUrl"]
        projectId = req.headers["ProjectId"]
        hubName = req.headers["HubName"]
        planId = req.headers["PlanId"]
        taskInstanceId = req.headers["TaskInstanceId"]
        jobId = req.headers["JobId"]

        endpoint = f"{planUrl}/{projectId}/_apis/distributedtask/hubs/{hubName}/plans/{planId}/events?api-version=2.0-preview.1"
        body = {
            "name": "TaskCompleted",
            "taskId": taskInstanceId,
            "jobId": jobId,
            "result": "succeeded"
        }
        logging.info(endpoint)
        logging.info(body)
        requests.post(endpoint, data=body)

    except:
        pass

    finally:
        logging.info ("Completed asynchronous worker task")

The Invoke Azure Function task does not time out after 20 seconds, but now does not complete at all:

============================================================================== 
Task         : Azure Function: https://tcegissandbox.azurewebsites.net/api/HttpTrigger
Description  : Invoke an Azure Function
Version      : 1.0.10
Author       : Microsoft Corporation
Help URL     : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/azure-function
============================================================================== 
2019-11-20T19:57:34.9975845Z POST https://tcegissandbox.azurewebsites.net/api/HttpTrigger?code=None&
                Response Code: OK
                Response: Succeeded
Waiting for console output from the server...

where it's been stuck now for ~17 minutes.

If Microsoft could just go ahead and provide some correct documentation or guidance on this, yeah, that would be great.

titiks commented 4 years ago

Hello, I'm in a similar situation. I have an agentless job triggering an azure function written in powershell that starts a VM. As it takes more than 20 seconds, the job times out. So I'm trying to start the VM in a separate thread but I can't get it to work.

I'm new to powershell so I may be doing something wrong. When run synchronously, the azure function works and the VM starts as expected. To run on a separate thread, I tried using Start-Job but I get the following message : "'Start-Job' is not supported by design in scenarios where PowerShell is being hosted in other applications" and the message suggests to use Start-ThreadJob. And when using Start-ThreadJob it does nothing, looks like the thread is terminated when the azure function returns (so it does not have the time to do anything). If I pipe Wait-Job, it works since it's waiting for the thread to finish, but then again it takes more than 20 seconds.

Here's my azure function code:

using namespace System.Net

param($Request, $TriggerMetadata)

$requestBody = $Request.Body;

Write-Host "Starting VM $($requestBody.vmname) in resource group $($requestBody.resourcegroup)";
Start-ThreadJob -ScriptBlock {
    $prm = $using:requestBody;
    Start-AzVM -ResourceGroupName $prm.resourcegroup -Name $prm.vmname;
};
$status = [HttpStatusCode]::OK;
$body = "";

# Associate values to output bindings by calling 'Push-OutputBinding'.
Push-OutputBinding -Name Response -Value ([HttpResponseContext]@{
    StatusCode = $status
    Body = $body
});

So am I doing sth wrong or it just cannot be done using powershell ?

MohanSairam423 commented 4 years ago

Hello,

Is there a way to pause an agent job and wait for callback. Currently I'm able to see this feature(wait for completion) only for server jobs. I would like to know if it is supported in agent jobs as well.

thesattiraju commented 4 years ago

Closing stale issue. Feel free to reopen if the problem still persists.

sukumarpatel commented 3 years ago

I have set up this task and it is working as expected. Hiwver, the REST API task waits for a max of 60 minutes before it fails. How can we increase this time out?

tokleine commented 2 years ago

I am still experiencing issues with this behavior. It seems that the timeout was increased to 60s. However, I still ran into timeouts. My workaround is to have several dummy (warm-up) calls, they do not have any success criteria, so they get 60s each to warm up the function and then I run my proper requests. Rather stupid to have to resort to this workaround though.

EDIT: The workaround seems to fail without any changes done. So I guess it is NOT a workaround.

Any advice on how to further debug this? Would it help to create a warmup function and call that first?

jbt-jaime commented 2 years ago
def main(req: func.HttpRequest) -> func.HttpResponse:

    t = threading.Thread(target=do_work, args=(req,))
    t.start()

    return func.HttpResponse("Succeeded", status_code=200)

def do_work(req: func.HttpRequest):

    logging.info ("Starting asynchronous worker task")

    #time.sleep(21)

    etc...

I'm quite late to this party but this threading workaround seems to work for me. If it didn't work for @JonathanBaileyTC in 2019 it's either because things have changed since then or he maybe he was missing the auth token in the POST headers:

 auth_token = headers["authtoken"]
 headers = {"Authorization": f"Bearer {auth_token}"}
 r = requests.post(endpoint, json=body, headers=headers)

Not ideal that the user has to code all the callback logic into their function, but it seems to work.