slackapi / deno-slack-sdk

SDK for building Run on Slack apps using Deno
https://api.slack.com/automation
MIT License
158 stars 27 forks source link

[QUERY] Workflow went a bit crazy this morning #229

Closed tomas-zijdemans-vipps closed 11 months ago

tomas-zijdemans-vipps commented 11 months ago

Hi,

So something really weird happened this morning (08:40 Oslo time). A workflow suddenly behaved very weird, was there a deploy in the backend? I've seen this sort of issue happen before randomly.

It seems like the following is happening:

My workflow code containing 4 steps:

const getSplunkResultsStep = workflow.addStep(GetSplunkErrors, {
  channelId: workflow.inputs.channel_id,
  filter: "Incorrect Merchant Setup", 
});

const getSalesUnitsStep = workflow.addStep(GetSalesUnits, {
  msns: getSplunkResultsStep.outputs.all_merchants_msns,
});

const sendReport = workflow.addStep(SendPartnerManagementReport, {
  sales_units: getSalesUnitsStep.outputs.sales_units,
  merchant_errors: getSplunkResultsStep.outputs.splunk_errors,
});

workflow.addStep(SendLogMessage, {
  headerText: "",
  messages: [sendReport.outputs.log_message ?? ""],
  devMode: workflow.inputs.devMode,
});

Slack logs:

2023-10-18 08:39:46 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Event trigger successfully started execution of function 'Incorrect Merchant Setup'
2023-10-18 08:39:46 [info] [Fn060TC33XD4] (Trace=Tr06179N9VE3) Function 'Incorrect Merchant Setup' (workflow function) started
2023-10-18 08:39:47 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow 'Incorrect Merchant Setup' started
2023-10-18 08:39:47 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow '%!!(MISSING)s(<nil>)' is excluded from billing
2023-10-18 08:39:47 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow step 1 of 4 started
2023-10-18 08:39:48 [info] [Fn060X3USBNZ] (Trace=Tr06179N9VE3) Function 'Get Splunk Errors from Slack Chanel' (app function) started
2023-10-18 08:39:50 [info] [Fn060X3USBNZ] (Trace=Tr06179N9VE3) Function 'Get Splunk Errors from Slack Chanel' (app function) completed
2023-10-18 08:39:50 [info] [Fn060X3USBNZ] (Trace=Tr06179N9VE3) Function output:

2023-10-18 08:39:50 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow step 'Get Splunk Errors from Slack Chanel' completed
2023-10-18 08:39:52 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow step 2 of 4 started
2023-10-18 08:39:52 [info] [Fn061CLAHJ8Z] (Trace=Tr06179N9VE3) Function 'Get Sales Units' (app function) started
2023-10-18 08:40:06 [info] [Fn061CLAHJ8Z] (Trace=Tr06179N9VE3) Function 'Get Sales Units' (app function) completed
2023-10-18 08:40:06 [info] [Fn061CLAHJ8Z] (Trace=Tr06179N9VE3) Function 'Get Sales Units' (app function) completed
2023-10-18 08:40:06 [info] [Fn061CLAHJ8Z] (Trace=Tr06179N9VE3) Function 'Get Sales Units' (app function) completed
2023-10-18 08:40:06 [info] [Fn061CLAHJ8Z] (Trace=Tr06179N9VE3) Function 'Get Sales Units' (app function) completed
2023-10-18 08:40:07 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow step 'Get Sales Units' completed
2023-10-18 08:40:07 [info] [Fn061CLAHJ8Z] (Trace=Tr06179N9VE3) Function output:

2023-10-18 08:40:08 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow step 3 of 4 started
2023-10-18 08:40:08 [info] [Fn06101K33GS] (Trace=Tr06179N9VE3) Function 'Send Partner Management Report' (app function) started
2023-10-18 08:40:18 [info] [Fn06101K33GS] (Trace=Tr06179N9VE3) Function 'Send Partner Management Report' (app function) completed
2023-10-18 08:40:19 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow step 'Send Partner Management Report' completed
2023-10-18 08:40:19 [info] [Fn06101K33GS] (Trace=Tr06179N9VE3) Function output:

2023-10-18 08:40:20 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow step 4 of 4 started
2023-10-18 08:40:20 [info] [Fn060LK9JVSB] (Trace=Tr06179N9VE3) Function 'Send Log Message' (app function) started
2023-10-18 08:40:22 [info] [Fn060LK9JVSB] (Trace=Tr06179N9VE3) Function 'Send Log Message' (app function) completed
2023-10-18 08:40:22 [info] [Fn060LK9JVSB] (Trace=Tr06179N9VE3) Function output:

2023-10-18 08:40:22 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow step 'Send Log Message' completed
2023-10-18 08:40:23 [info] [Fn060TC33XD4] (Trace=Tr06179N9VE3) Function 'Incorrect Merchant Setup' (workflow function) completed
2023-10-18 08:40:23 [info] [Wf060KF6ML3H] (Trace=Tr06179N9VE3) Workflow 'Incorrect Merchant Setup' completed
filmaj commented 11 months ago

I believe we may have introduced a bug in function dispatching as we ramp up the feature rollout to address bumping the timeout for locally-running functions to 15 seconds, to match the deployed timeout (#179). It seems the logic in our solution may have inadvertently affected deployed functions when calculating the timeout. We are reverting this change as we have seen timeouts and retries increase across our entire system.

I will post an update once the revert is complete and I have an update from our side / our monitoring.

filmaj commented 11 months ago

We have rolled back the change and the number of timeout (and thus requeues/retries) has dropped dramatically.

I think that was the underlying cause of the issue.

I will close this issue but if you see it happen again please re-open and we can continue the investigation.

tomas-zijdemans-vipps commented 11 months ago

Thanks!