Closed JustinNewport closed 1 year ago
This sounds like it could be related to the "cold start" delays in AWS Lambda, causing your application to respond slowly after some amount of time. Check out this blog post for some details around how and when cold starts can occur.
There are several (closed) issues in this repo that talk about possible work-arounds; searching for these issues may provide some suggestions.
This sounds like it could be related to the "cold start" delays in AWS Lambda, causing your application to respond slowly after some amount of time. Check out this blog post for some details around how and when cold starts can occur.
There are several (closed) issues in this repo that talk about possible work-arounds; searching for these issues may provide some suggestions.
Does not appear to be similar to these issues. It can successfully process a number of successive calls and then suddenly stop working. Additionally, all modules are loaded without issue. The app hangs waiting for a response from Slack
Hi @JustinNewport, with given information, I cannot tell the direct cause of the auth.test API call issue but one thing you may want to try would be changing the timeout setting for API calls. With that, you may observe some behavior changes of your app.
app = App(
# The default timeout is 30 seconds
client=WebClient(timeout=5),
process_before_response=True,
)
Also, when the scenario occurs, what do the lambda function logs show for you? Especially, I am curious about the execution timing logs on the AWS Lambda side, not your bolt-python app execution.
I hope you will figure out the cause soon!
Hi @seratch, I tried the above approach today and saw no difference in results. It would work now and then, but when it did not, the only thing that showed up in the lambda logs was the same output as above. See also below. This failed run followed a successful run just 6 minutes prior lambdaLogs_withWebClientObject_failure.txt
@JustinNewport With given information, I don't think that this is an SDK issue. Since auth.test API is pretty stable, I cannot think of such frequent outage of the endpoint.
If you haven't yet contacted our customer support agents, I would recommend asking them to do some Slack server-side investigation. You can submit an inquiry from either /feedback
in your Slack workspace or https://my.slack.com/help/requests/new.
Once you share your workspace ID (team_id) and App ID (api_app_id / https://api.slack.com/apps/{api_app_id}
) with them, they can check the records on your auth.test API calls. If some of auth.test requests from your app do not reach to Slack, no server-side logs should exist. If your app is struggling to receive responses from Slack for some reason (e.g., network, AWS-related issues), Slack server-side should have logs on the API calls for sure.
Hi @seratch, it turns out that our Lambda timeout was too short, cutting off the timeout for the sdk. Once we set the Lambda to >5 minutes, the sdk timeout (appears to be 4 min, 30 seconds) was reached which allowed a stack trace to be generated.
This of course reveals that while the Lambda logs showed
Sending a request - url: https://www.slack.com/api/auth.test, query_params: {}, body_params: {}, files: {}, json_body: None, headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': '(redacted)', 'User-Agent': 'Bolt/1.15.1 Python/3.7.13 slackclient/3.19.1 Linux/4.14.255-276-224.499.amzn2.x86_64'}
it was in fact not sending that request. The sdk's logging shows that URLError is occurring
[ERROR] URLError: <urlopen error timed out>
Traceback (most recent call last):
File "/var/lang/lib/python3.7/imp.py", line 234, in load_module
return load_source(name, filename, file)
File "/var/lang/lib/python3.7/imp.py", line 171, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 696, in _load
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/var/task/storeCredit.py", line 13, in <module>
app = App(process_before_response=True)
File "/opt/python/slack_bolt/app/app.py", line 329, in __init__
url_verification_enabled=url_verification_enabled,
File "/opt/python/slack_bolt/app/app.py", line 359, in _init_middleware_list
auth_test_result = self._client.auth_test(token=self._token)
File "/opt/python/slack_sdk/web/client.py", line 1532, in auth_test
return self.api_call("auth.test", params=kwargs)
File "/opt/python/slack_sdk/web/base_client.py", line 156, in api_call
return self._sync_send(api_url=api_url, req_args=req_args)
File "/opt/python/slack_sdk/web/base_client.py", line 194, in _sync_send
additional_headers=headers,
File "/opt/python/slack_sdk/web/base_client.py", line 294, in _urllib_api_call
response = self._perform_urllib_http_request(url=url, args=request_args)
File "/opt/python/slack_sdk/web/base_client.py", line 473, in _perform_urllib_http_request
raise err
File "/opt/python/slack_sdk/web/base_client.py", line 397, in _perform_urllib_http_request
resp = self._perform_urllib_http_request_internal(url, req)
File "/opt/python/slack_sdk/web/base_client.py", line 504, in _perform_urllib_http_request_internal
resp = urlopen(req, context=self.ssl, timeout=self.timeout) # skipcq: BAN-B310
File "/var/lang/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/var/lang/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/var/lang/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/var/lang/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/var/lang/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/var/lang/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
Open question: has anyone experienced an issue with this on Lambda? Thanks in advance.
👋 It looks like this issue has been open for 30 days with no activity. We'll mark this as stale for now, and wait 10 days for an update or for further comment before closing this issue out. If you think this issue needs to be prioritized, please comment to get the thread going again! Maintainers also review issues marked as stale on a regular basis and comment or adjust status if the issue needs to be reprioritized.
Very interesting @JustinNewport . Are your Lambdas behind a VPC, by any chance? Unclear why opening a connection would timeout - perhaps DNS resolution? Maybe Slack experienced an outage around the time of these requests?
While it may be separate from this particular issue regarding URL opening timeout, I would also caution, in general, to not discount cold starts when working in Lambda. Lambdas can experience cold starts after either 30 mins of inactivity or an increase in the number of concurrent executions of a particular Lambda. In the former case, the number of Lambda containers warm and active and available to execute drops as idle time increases. In the latter case, as the number of concurrent executions rises, the need to spin up more concurrent containers able to run Lambdas increases, leading to an increase in cold starts (linearly with the number of new concurrent executions needed).
@filmaj after your comment we looked into our vpc/subnet arrangement and that was indeed the issue. we had 2 subnets attached; one could reach the internet and the other could not, hence the intermittent failures.
Glad to hear it got resolved!
Appears that either slack fails to respond to auth.test call from App object during initialization, or response is lost on AWS side. Issue is intermittent as appears to follow no pattern.
Reproducible in:
AWS Lambda slack_bolt 1.15.1 (applied as layer, also fails when zipped and packaged) slack_sdk 3.19.1 (applied as layer, also fails when zipped and packaged)
Python runtime version
Python 3.7
Expected result:
Receive a response from https://www.slack.com/api/auth.test via App() instantiation.
Actual result:
Output of CloudWatch log when failure occurs below. This would be followed by a task timeout message:
But then a few more minutes later, nothing happens. So this is not a lambda cold start issue.
Abbreviated code for simplicity, but is representative of structure
Requirements
Please read the Contributing guidelines and Code of Conduct before creating this issue or pull request. By submitting, you are agreeing to those rules.