kiwigrid / k8s-sidecar

This is a docker container intended to run inside a kubernetes cluster to collect config maps with a specified label and store the included files in a local folder.
MIT License
612 stars 183 forks source link

ProtocolError - InvalidChunkLength #233

Open ngc4579 opened 2 years ago

ngc4579 commented 2 years ago

A quay.io/kiwigrid/k8s-sidecar:1.21.0 container deployed alongside Grafana in a k8s 1.24 evaluation cluster seems to start correctly, but afterwards floods logs with ProtocolError messages as seen below:

{"time": "2022-11-19T12:12:04.355037+00:00", "level": "INFO", "msg": "Starting collector"}
{"time": "2022-11-19T12:12:04.355222+00:00", "level": "WARNING", "msg": "No folder annotation was provided, defaulting to k8s-sidecar-target-directory"}
{"time": "2022-11-19T12:12:04.355347+00:00", "level": "INFO", "msg": "Loading incluster config ..."}
{"time": "2022-11-19T12:12:04.356003+00:00", "level": "INFO", "msg": "Config for cluster api at 'https://api.eval.420214.internal.prod.<redacted>:443' loaded..."}
{"time": "2022-11-19T12:12:04.356101+00:00", "level": "INFO", "msg": "Unique filenames will not be enforced."}
{"time": "2022-11-19T12:12:04.356173+00:00", "level": "INFO", "msg": "5xx response content will not be enabled."}
{"time": "2022-11-19T12:12:08.031424+00:00", "level": "INFO", "msg": "Subprocess exiting gracefully"}
{"time": "2022-11-19T12:12:08.032719+00:00", "level": "INFO", "msg": "Subprocess exiting gracefully"}
{"time": "2022-11-19T12:12:08.032763+00:00", "level": "INFO", "msg": "Subprocess exiting gracefully"}
{"time": "2022-11-19T12:12:59.386760+00:00", "level": "ERROR", "msg": "ProtocolError when calling kubernetes: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))"}
{"time": "2022-11-19T12:12:59.387126+00:00", "level": "ERROR", "msg": "ProtocolError when calling kubernetes: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))"}
{"time": "2022-11-19T12:13:54.412196+00:00", "level": "ERROR", "msg": "ProtocolError when calling kubernetes: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))"}
{"time": "2022-11-19T12:13:54.426154+00:00", "level": "ERROR", "msg": "ProtocolError when calling kubernetes: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))"}
...

Is this a known issue?

jekkel commented 2 years ago

Hi @ngc4579 , no, it's not AFAIK. Can you provide us some more details? Are you able to call the k8s API manually (through a curl pod) to confirm the cluster config (k8s api server config) is correct?

ngc4579 commented 2 years ago

@jekkel Thanks for your answer... - Well, at least a $ curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api returns some sensible result as far as I can tell:

{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "api.eval.420214.internal.prod.<redacted>:443"
    }
  ]
}
Zebradil commented 1 year ago

We noticed the same issue around 2 months ago. It seems to be not critical because configmaps are successfully processed. The only sign that something is not right is the error messages in logs.

Zebradil commented 1 year ago

And this is a stack trace which I managed to retrieve.

Traceback (most recent call last):
  File "/app/.venv/lib/python3.10/site-packages/urllib3/response.py", line 700, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.10/site-packages/urllib3/response.py", line 441, in _error_catcher
    yield
  File "/app/.venv/lib/python3.10/site-packages/urllib3/response.py", line 767, in read_chunked
    self._update_chunk_length()
  File "/app/.venv/lib/python3.10/site-packages/urllib3/response.py", line 704, in _update_chunk_length
    raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/dbg.py", line 6, in <module>
    for event in stream:
  File "/app/.venv/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 165, in stream
    for line in iter_resp_lines(resp):
  File "/app/.venv/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 56, in iter_resp_lines
    for seg in resp.stream(amt=None, decode_content=False):
  File "/app/.venv/lib/python3.10/site-packages/urllib3/response.py", line 575, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/app/.venv/lib/python3.10/site-packages/urllib3/response.py", line 755, in read_chunked
    with self._error_catcher():
  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/app/.venv/lib/python3.10/site-packages/urllib3/response.py", line 458, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

The code was used (it was running in a loop because most of the time it works as expected):

from kubernetes import client, config, watch

config.load_incluster_config()
v1 = client.CoreV1Api()
stream = watch.Watch().stream(v1.list_config_map_for_all_namespaces, label_selector='grafana_dashboard=1', timeout_seconds=60, _request_timeout=66)
for event in stream:
    print(event['object'].metadata.name)

So far it looks like some networking issues rather than an issue with k8s-sidecar itself.

Zebradil commented 1 year ago

Related issues in the Kubernetes python client:

The suggested solution (which is actually a workaround) is to implement a retry logic. However, looking at the k8s-sidecar's code it seems to be not really needed, because the retry logic is implemented on a higher level already. Personally, I'd decrease the severity of the corresponding log message from ERROR to WARNING, because, in fact, everything is working fine.

sjentzsch commented 1 year ago

Hm, I am running into the same issue, and can confirm it's still the case with 1.24.3. Reconcilation is fine, it does it's job, I cannot confirm any network issues, but I get spammed by msg": "ProtocolError when calling kubernetes: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))\n", "level": "ERROR" every 55 seconds 🤔

With watchMethod: SLEEP those error logs are gone, and every 65 seconds I receive a Performing list-based sync on ... instead. That's my workaround for now to get rid of the error logs.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.