rollbar / pyrollbar

Error tracking and logging from Python to Rollbar
https://docs.rollbar.com/docs/python/
MIT License
213 stars 135 forks source link

SSL/TLS error in Django connecting to api.rollbar.com on exceptions #215

Open taylor-cedar opened 6 years ago

taylor-cedar commented 6 years ago

This error happens for about 2 out of every 3 requests to send an exception to rollbar. It occasionally works, but mostly doesn't. The following stack trace is printed only occasionally. Sometimes it will fail with no exception stack trace.

Traceback (most recent call last):
   File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
     chunked=chunked)
   File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
     self._validate_conn(conn)
   File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
     conn.connect()
   File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 326, in connect
     ssl_context=context)
   File "/usr/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
     return context.wrap_socket(sock, server_hostname=server_hostname)
   File "/usr/lib/python3.6/ssl.py", line 401, in wrap_socket
     _context=self, _session=session)
   File "/usr/lib/python3.6/ssl.py", line 808, in __init__
     self.do_handshake()
   File "/usr/lib/python3.6/ssl.py", line 1061, in do_handshake
     self._sslobj.do_handshake()
   File "/usr/lib/python3.6/ssl.py", line 683, in do_handshake
     self._sslobj.do_handshake()
 ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:749)

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
     timeout=timeout
   File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
     _stacktrace=sys.exc_info()[2])
   File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
     raise MaxRetryError(_pool, url, error or ResponseError(cause))
 urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.rollbar.com', port=443): Max retries exceeded with url: /api/1/item/ (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:749)'),))

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/lib/python3.6/site-packages/rollbar/__init__.py", line 1209, in _send_payload
     _post_api('item/', payload, access_token=access_token)
   File "/usr/lib/python3.6/site-packages/rollbar/__init__.py", line 1255, in _post_api
     verify=SETTINGS.get('verify_https', True))
   File "/usr/lib/python3.6/site-packages/rollbar/lib/transport.py", line 7, in post
     return _session.post(*args, **kw)
   File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 555, in post
     return self.request('POST', url, data=data, json=json, **kwargs)
   File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
     resp = self.send(prep, **send_kwargs)
   File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
     r = adapter.send(request, **kwargs)
   File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 506, in send
     raise SSLError(e, request=request)
 requests.exceptions.SSLError: HTTPSConnectionPool(host='api.rollbar.com', port=443): Max retries exceeded with url: /api/1/item/ (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:749)'),))

Python 3.6.1 Django 1.11.1 uwsgi 2.0.14 requests 2.18.4 open-ssl 1.0.2k 26 Jan 2017 Docker Alpine Linux

Please let me know if I can provide more info. It seems like some kind of race condition, but not sure, since it sometimes works. Maybe it's a thread safety issue?

taylor-cedar commented 6 years ago

I am testing with the debug error view, so maybe related to this? https://github.com/rollbar/pyrollbar/issues/180 . Not sure because I can't see clubhouse.

rivkahstandig3636 commented 6 years ago

Hi @taylor-cedar, Are you still having this issue? If so, what version of pyrollbar are you on?

rivkahstandig3636 commented 6 years ago

Closing due to inactivity.

iasci commented 6 years ago

I think this issue is still valid. We are experiencing the same problem with the following setup:

I read some similar issues on other client libraries and seems like it can be related to thread (un)safety of requests's sessions. See: https://stackoverflow.com/questions/18188044/is-the-session-object-from-pythons-requests-library-thread-safe

mjhanke commented 5 years ago

Mentioned in #180, but we're running django version 2.1.2 and rollbar version 0.14.5 and are still seeing a very similar traceback as mentioned above:

[ERROR] 2019-09-17T04:19:55.506Z    d8024161-f0a6-4e42-b421-dbdf8ad4cb6c    Exception while posting item SSLError(MaxRetryError("HTTPSConnectionPool(host='api.rollbar.com', port=443): Max retries exceeded with url: /api/1/item/ (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),))",),)
Traceback (most recent call last):
  File "/var/task/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/var/task/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/var/task/urllib3/connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "/var/task/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/var/task/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/var/lang/lib/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/var/lang/lib/python3.6/ssl.py", line 817, in __init__
    self.do_handshake()
  File "/var/lang/lib/python3.6/ssl.py", line 1077, in do_handshake
    self._sslobj.do_handshake()
  File "/var/lang/lib/python3.6/ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:852)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/var/task/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/var/task/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.rollbar.com', port=443): Max retries exceeded with url: /api/1/item/ (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/rollbar/__init__.py", line 1336, in _send_payload
    _post_api('item/', payload_str, access_token=access_token)
  File "/var/task/rollbar/__init__.py", line 1385, in _post_api
    proxy_password=SETTINGS.get('http_proxy_password'))
  File "/var/task/rollbar/lib/transport.py", line 33, in post
    return _session().post(*args, proxies=proxies, **kw)
  File "/var/task/requests/sessions.py", line 555, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/var/task/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/var/task/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/var/task/requests/adapters.py", line 506, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='api.rollbar.com', port=443): Max retries exceeded with url: /api/1/item/ (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),))

Other useful context might be that this code is executing in an AWS lambda function environment

subodhpareek18 commented 2 years ago

Facing the same issue while on flask. It's working fine when running on my local, but when deployed to AWS inside a docker container get an error.

Flask==0.10.1
rollbar~=0.15
[pid: 30|app: 0|req: 4/19] 69.162.124.233 () {46 vars in 720 bytes} [Fri Feb  4 07:51:23 2022] HEAD / => generated 0 bytes in 4 msecs (HTTP/1.1 302) 4 headers in 407 bytes (0 switches on core 0)
ERR0204 07:46:23.343 31 __init__.py:1499] Exception while posting item SSLError(SSLError(SSLEOFError(8, u'EOF occurred in violation of protocol (_ssl.c:590)'),),)
Traceback (most recent call last):
  File "/home/tvm/backend/venv/lib/python2.7/site-packages/rollbar/__init__.py", line 1497, in _send_payload
    _post_api('item/', payload_str, access_token=access_token)
  File "/home/tvm/backend/venv/lib/python2.7/site-packages/rollbar/__init__.py", line 1552, in _post_api
    proxy_password=SETTINGS.get('http_proxy_password'))
  File "/home/tvm/backend/venv/lib/python2.7/site-packages/rollbar/lib/transport.py", line 44, in post
    return _session().post(*args, proxies=proxies, **kw)
  File "/home/tvm/backend/venv/lib/python2.7/site-packages/requests/sessions.py", line 511, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/home/tvm/backend/venv/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/tvm/backend/venv/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/home/tvm/backend/venv/lib/python2.7/site-packages/requests/adapters.py", line 447, in send
    raise SSLError(e, request=request)
SSLError: EOF occurred in violation of protocol (_ssl.c:590)
jsam commented 1 year ago

Any update on this issue? We are facing the same problem using 0.16.2

danielmorell commented 1 year ago

I have a hunch this may be connected with how we handle threading and sessions when transporting payloads. Because our default 'threading' handler does not reuse transport threads we throw away each new requests.Session() we create after the payload has been sent. This is probably causing an unnecessary number of SSL handshakes and/or this may not be completely thread safe.

I would suggest trying our experimental thread_pool handler in rollbar==0.16.4b0. It creates a pool of worker threads that will retain the requests session between payload sends.

jackie-greenbaum commented 1 year ago

We are running into this issue on our AWS Lambda environment as well. Unfortunately, @danielmorell 's suggestion above of upgrading to 0.16.4b0 and using the thread_pool handler did not seem to fix the issue.

I noticed in my internal testing that I was only able to reproduce the bug when reporting to Rollbar from multiple lambda instances concurrently (though I'm not sure if the concurrency, the higher volume of requests in a short period, or something else entirely is the cause).

Our solution for now is to override rollbar.send_payload() to catch ConnectTimeoutError and ReadTimeoutError, wait a few seconds, and retry.

echee-insitro commented 1 year ago

We're also running into this issue in AWS lambda. The lambda runs 1x/daily with a reserved concurrency of 1. 0.16.4b0 didn't fix it.

ghost commented 1 year ago

@danielmorell Isaac created a batched transformer that might be able to resolve this issue. Can you look into it to see if it could resolve this issue? It was this PR: https://github.com/rollbar/pyrollbar/pull/421

danielmorell commented 1 year ago

@rollbar-bborsits This seems like it has to do with the transport process but probably not the serialization and scrubbing process.

ghost commented 1 year ago

@danielmorell Do you need more info to debug this issue?

jeremy-insitro commented 1 year ago

We've also been frequently experiencing these SSL errors with Flask (only when deployed to AWS inside a docker container), but setting handler="blocking" when initializing Rollbar with rollbar.init(...) seems to have fixed the issue for us.

We're running pyrollbar 0.16.3 and flask 2.0.2

sqpollen commented 1 year ago

Also getting this error on a Wagtail/Django deployment on AWS App Runner. Does not happen on a standard EC2 deployment.

rollbar==0.16.3

ERROR:rollbar:Exception while posting item ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='api.rollbar.com', port=443): Max retries exceeded with url: /api/1/item/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fec47b2df10>, 'Connection to api.rollbar.com timed out. (connect timeout=3)'))"))

settings.py

ROLLBAR = {
     'access_token': 'token goes here,
     'environment': 'staging',
     'root': BASE_DIR,
 }
danielmorell commented 1 year ago

This error is not caused by the Python Rollbar SDK. It likely comes from the SSL implementation available to requests, or from a networking issue between your server, and the Rollbar backend. I would recommend reading this issue https://github.com/psf/requests/issues/3605 for some context from the maintainers of requests.

I won't restate everything in that issue thread. However, one important take away is that if the issue is intermittent, it is likely a networking issue. If it is consistent, you may want to check the version of OpenSSL as well as the rest of your networking dependencies.

Lastly, the final error is thrown when the max retries is exceeded. If this is being throw intermittently, we could add a config option to silence these exceptions, and possibly write them to an agent file. What are everyone's thoughts on this?

echee-insitro commented 1 year ago

A config option to prevent pyrollbar exceptions from interrupting the application sounds good. Perhaps instead of silencing, if it's possible to write to the root python logger / stderr handler, while avoiding recursive logging by the rollbar handler itself? The idea is to preserve visibility into pyrollbar problems -- e.g. visible as warnings in AWS cloudwatch logs. Thank you.