Closed komuw closed 5 years ago
this occured on one occasion while running the benchmarks: https://github.com/komuw/wiji-benchmarks
We need to run wiji
for a long period (say 24 or 48 hrs) under load and see if we can replicate this.
We also need to do memory profiling to make sure we are not retaining of large objects and thus preventing them been garbage collected.
I just tried to do some memory profiling. It is complicated to do this in Python-land, I just wish I was using Golang's pprof tool
the oom
occured again this time using the redis broker; https://github.com/komuw/wiji-benchmarks/blob/master/benchmarks/redis_broker.py
I've been running wiji-cli
with 6 different task types(IO-bound, mem-bound, cpu-bound) under load for the past 1hr and wiji
seems to be running consistently at around 69MB memory use. The memory used doesn't seem to be 'running away'.
I'll try and run wiji
for longer(24hrs/48hrs) and see how the memory profile looks like.
here's the profile after 15hrs of running.
6hrs after starting to run, memory grows continuously . It started off at 40MB and then at the top end it is at 500MB
using https://github.com/komuw/wiji/compare/issues/71?expand=1 for memory profiling
Top 15 lines
wiji_cli | total_allocated_size: 3373.6KiB
wiji_cli | index:#1: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:92 stat_size:407.8KiB stat_count:2563
wiji_cli | offending_line: setattr(conditional_lib, attr, getattr(lib, attr))
wiji_cli | index:#2: file:/usr/local/lib/python3.7/site-packages/redis/_compat.py:58 stat_size:384.2KiB stat_count:6
wiji_cli | offending_line: return sock.recv(*args, **kwargs)
wiji_cli | index:#3: file:/usr/local/lib/python3.7/site-packages/wiji/broker.py:111 stat_size:177.3KiB stat_count:1
wiji_cli | offending_line: self.store[queue_name].append(item)
wiji_cli | index:#4: file:/usr/local/lib/python3.7/site-packages/idna/uts46data.py:8204 stat_size:152.2KiB stat_count:80
wiji_cli | offending_line: + _seg_78()
wiji_cli | index:#5: file:/usr/local/lib/python3.7/inspect.py:2217 stat_size:96.2KiB stat_count:162
wiji_cli | offending_line: sigcls=sigcls)
wiji_cli | index:#6: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:90 stat_size:94.7KiB stat_count:1394
wiji_cli | offending_line: for attr in dir(lib):
wiji_cli | index:#7: file:/usr/local/lib/python3.7/abc.py:126 stat_size:75.2KiB stat_count:388
wiji_cli | offending_line: cls = super().__new__(mcls, name, bases, namespace, **kwargs)
wiji_cli | index:#8: file:/usr/local/lib/python3.7/inspect.py:2800 stat_size:44.6KiB stat_count:272
wiji_cli | offending_line: for param in parameters))
wiji_cli | index:#9: file:/usr/local/lib/python3.7/site-packages/redis/client.py:1567 stat_size:40.6KiB stat_count:94
wiji_cli | offending_line: return self.execute_command('BRPOP', *keys)
wiji_cli | index:#10: file:/usr/local/lib/python3.7/concurrent/futures/thread.py:57 stat_size:36.9KiB stat_count:206
wiji_cli | offending_line: result = self.fn(*self.args, **self.kwargs)
wiji_cli | index:#11: file:/usr/local/lib/python3.7/abc.py:127 stat_size:35.9KiB stat_count:257
wiji_cli | offending_line: _abc_init(cls)
wiji_cli | index:#12: file:/usr/local/lib/python3.7/abc.py:143 stat_size:31.9KiB stat_count:303
wiji_cli | offending_line: return _abc_subclasscheck(cls, subclass)
wiji_cli | index:#13: file:/usr/local/lib/python3.7/abc.py:135 stat_size:29.6KiB stat_count:245
wiji_cli | offending_line: return _abc_register(cls, subclass)
wiji_cli | index:#14: file:/usr/local/lib/python3.7/site-packages/cryptography/utils.py:41 stat_size:27.4KiB stat_count:309
wiji_cli | offending_line: return property(lambda self: getattr(self, name))
wiji_cli | index:#15: file:/usr/local/lib/python3.7/inspect.py:2859 stat_size:22.1KiB stat_count:346
wiji_cli | offending_line: params = tuple(param for param in self.parameters.values()
wiji_cli | num_other_offenders:3107 combined_stat_size:1716.9KiB
I'll let the profile run overnight
I re-ran wiji
overnight using the patch in https://github.com/komuw/wiji/compare/issues/71?expand=1;
and tracemalloc:
Top 15 lines
total_allocated_size: 2705.1KiB
index:#1: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:92 stat_size:407.8KiB stat_count:2563
offending_line: setattr(conditional_lib, attr, getattr(lib, attr))
index:#2: file:/usr/local/lib/python3.7/site-packages/idna/uts46data.py:8204 stat_size:152.2KiB stat_count:80
offending_line: + _seg_78()
index:#3: file:/usr/local/lib/python3.7/site-packages/wiji/broker.py:111 stat_size:140.0KiB stat_count:1
offending_line: self.store[queue_name].append(item)
index:#4: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:90 stat_size:94.7KiB stat_count:1394
offending_line: for attr in dir(lib):
index:#5: file:/usr/local/lib/python3.7/abc.py:126 stat_size:75.2KiB stat_count:388
offending_line: cls = super().__new__(mcls, name, bases, namespace, **kwargs)
index:#6: file:/usr/local/lib/python3.7/inspect.py:2800 stat_size:44.0KiB stat_count:268
offending_line: for param in parameters))
index:#7: file:/usr/local/lib/python3.7/abc.py:127 stat_size:36.1KiB stat_count:261
offending_line: _abc_init(cls)
index:#8: file:/usr/local/lib/python3.7/abc.py:143 stat_size:32.0KiB stat_count:304
offending_line: return _abc_subclasscheck(cls, subclass)
index:#9: file:/usr/local/lib/python3.7/abc.py:135 stat_size:29.6KiB stat_count:245
offending_line: return _abc_register(cls, subclass)
index:#10: file:/usr/local/lib/python3.7/site-packages/prometheus_client/metrics.py:183 stat_size:29.0KiB stat_count:421
offending_line: yield (suffix, dict(series_labels + list(sample_labels.items())), value)
index:#11: file:/usr/local/lib/python3.7/site-packages/cryptography/utils.py:41 stat_size:27.4KiB stat_count:309
offending_line: return property(lambda self: getattr(self, name))
index:#12: file:/usr/local/lib/python3.7/inspect.py:2859 stat_size:21.5KiB stat_count:339
offending_line: params = tuple(param for param in self.parameters.values()
index:#13: file:/usr/local/lib/python3.7/enum.py:170 stat_size:20.0KiB stat_count:105
offending_line: enum_class = super().__new__(metacls, cls, bases, classdict)
index:#14: file:/usr/local/lib/python3.7/inspect.py:2862 stat_size:18.4KiB stat_count:202
offending_line: kwo_params = {param.name: param for param in self.parameters.values()
index:#15: file:/usr/local/lib/python3.7/copy.py:76 stat_size:18.4KiB stat_count:112
offending_line: return copier(x)
num_other_offenders:3192 combined_stat_size:1558.8KiB
after running for 16hrs, there doesn't appear to be a leak. Maybe the act of measuring(using tracemalloc) made bug dissapear.
20hours later and we have tell-tale signs of mem-leak;
and tracemalloc:
Top 15 lines
total_allocated_size: 3309.4KiB
index:#1: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:92 stat_size:407.8KiB stat_count:2563
offending_line: setattr(conditional_lib, attr, getattr(lib, attr))
index:#2: file:/usr/local/lib/python3.7/site-packages/wiji/broker.py:111 stat_size:404.9KiB stat_count:1
offending_line: self.store[queue_name].append(item)
index:#3: file:/usr/local/lib/python3.7/site-packages/redis/_compat.py:58 stat_size:256.1KiB stat_count:4
offending_line: return sock.recv(*args, **kwargs)
index:#4: file:/usr/local/lib/python3.7/site-packages/idna/uts46data.py:8204 stat_size:152.2KiB stat_count:80
offending_line: + _seg_78()
index:#5: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:90 stat_size:94.7KiB stat_count:1394
offending_line: for attr in dir(lib):
index:#6: file:/usr/local/lib/python3.7/abc.py:126 stat_size:75.2KiB stat_count:388
offending_line: cls = super().__new__(mcls, name, bases, namespace, **kwargs)
index:#7: file:/usr/local/lib/python3.7/inspect.py:2800 stat_size:44.0KiB stat_count:268
offending_line: for param in parameters))
index:#8: file:/usr/local/lib/python3.7/abc.py:127 stat_size:36.1KiB stat_count:261
offending_line: _abc_init(cls)
index:#9: file:/usr/local/lib/python3.7/site-packages/prometheus_client/metrics.py:183 stat_size:33.3KiB stat_count:484
offending_line: yield (suffix, dict(series_labels + list(sample_labels.items())), value)
index:#10: file:/usr/local/lib/python3.7/site-packages/cli/utils/_producer.py:11 stat_size:32.1KiB stat_count:52
offending_line: await task.delay(*args, **kwargs)
index:#11: file:/usr/local/lib/python3.7/abc.py:143 stat_size:32.0KiB stat_count:304
offending_line: return _abc_subclasscheck(cls, subclass)
index:#12: file:/usr/local/lib/python3.7/site-packages/wiji/worker.py:354 stat_size:31.0KiB stat_count:51
offending_line: await self.run_task(*task_args, **task_kwargs)
from that tracemalloc; https://github.com/komuw/wiji/blob/dbece4bfaf8fb2903674c3801c1837d1bdc6aec1/wiji/broker.py#L109-L111
looks like we are appending to a dict and maybe never popping stuff out of it.
22hrs later and https://github.com/komuw/wiji/blob/dbece4bfaf8fb2903674c3801c1837d1bdc6aec1/wiji/broker.py#L109-L111 has jumped to the top with the most allocations at a size of 455.6KiB
Top 15 lines
total_allocated_size: 3341.7KiB
index:#1: file:/usr/local/lib/python3.7/site-packages/wiji/broker.py:111 stat_size:455.6KiB stat_count:1
offending_line: self.store[queue_name].append(item)
index:#2: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:92 stat_size:407.8KiB stat_count:2563
offending_line: setattr(conditional_lib, attr, getattr(lib, attr))
index:#3: file:/usr/local/lib/python3.7/site-packages/redis/_compat.py:58 stat_size:256.1KiB stat_count:4
offending_line: return sock.recv(*args, **kwargs)
index:#4: file:/usr/local/lib/python3.7/site-packages/idna/uts46data.py:8204 stat_size:152.2KiB stat_count:80
offending_line: + _seg_78()
index:#5: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:90 stat_size:94.7KiB stat_count:1394
offending_line: for attr in dir(lib):
index:#6: file:/usr/local/lib/python3.7/abc.py:126 stat_size:75.2KiB stat_count:388
offending_line: cls = super().__new__(mcls, name, bases, namespace, **kwargs)
index:#7: file:/usr/local/lib/python3.7/inspect.py:2800 stat_size:44.0KiB stat_count:268
offending_line: for param in parameters))
index:#8: file:/usr/local/lib/python3.7/abc.py:127 stat_size:36.1KiB stat_count:261
offending_line: _abc_init(cls)
index:#9: file:/usr/local/lib/python3.7/site-packages/wiji/worker.py:354 stat_size:34.1KiB stat_count:56
offending_line: await self.run_task(*task_args, **task_kwargs)
index:#10: file:/usr/local/lib/python3.7/abc.py:143 stat_size:32.0KiB stat_count:304
offending_line: return _abc_subclasscheck(cls, subclass)
index:#11: file:/usr/local/lib/python3.7/site-packages/prometheus_client/metrics.py:143 stat_size:29.9KiB stat_count:473
offending_line: labelvalues = tuple(unicode(labelkwargs[l]) for l in self._labelnames)
index:#12: file:/usr/local/lib/python3.7/abc.py:135 stat_size:29.6KiB stat_count:245
offending_line: return _abc_register(cls, subclass)
index:#13: file:/usr/local/lib/python3.7/site-packages/cli/utils/_producer.py:11 stat_size:27.8KiB stat_count:45
offending_line: await task.delay(*args, **kwargs)
index:#14: file:/usr/local/lib/python3.7/site-packages/cryptography/utils.py:41 stat_size:27.4KiB stat_count:309
offending_line: return property(lambda self: getattr(self, name))
27hrs
total_allocated_size: 4644.3KiB
index:#1: file:/usr/local/lib/python3.7/site-packages/wiji/broker.py:111 stat_size:1665.5KiB stat_count:1
offending_line: self.store[queue_name].append(item)
index:#2: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:92 stat_size:407.8KiB stat_count:2563
offending_line: setattr(conditional_lib, attr, getattr(lib, attr))
index:#3: file:/usr/local/lib/python3.7/site-packages/redis/_compat.py:58 stat_size:320.2KiB stat_count:5
offending_line: return sock.recv(*args, **kwargs)
index:#4: file:/usr/local/lib/python3.7/site-packages/idna/uts46data.py:8204 stat_size:152.2KiB stat_count:80
offending_line: + _seg_78()
index:#5: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:90 stat_size:94.7KiB stat_count:1394
offending_line: for attr in dir(lib):
index:#6: file:/usr/local/lib/python3.7/abc.py:126 stat_size:75.2KiB stat_count:388
offending_line: cls = super().__new__(mcls, name, bases, namespace, **kwargs)
index:#7: file:/usr/local/lib/python3.7/inspect.py:2800 stat_size:44.0KiB stat_count:268
offending_line: for param in parameters))
index:#8: file:/usr/local/lib/python3.7/site-packages/prometheus_client/metrics.py:143 stat_size:43.9KiB stat_count:697
offending_line: labelvalues = tuple(unicode(labelkwargs[l]) for l in self._labelnames)
index:#9: file:/usr/local/lib/python3.7/abc.py:127 stat_size:36.1KiB stat_count:261
offending_line: _abc_init(cls)
index:#10: file:/usr/local/lib/python3.7/inspect.py:2217 stat_size:35.6KiB stat_count:60
offending_line: sigcls=sigcls)
index:#11: file:/usr/local/lib/python3.7/site-packages/wiji/worker.py:354 stat_size:32.2KiB stat_count:53
offending_line: await self.run_task(*task_args, **task_kwargs)
index:#12: file:/usr/local/lib/python3.7/abc.py:143 stat_size:32.0KiB stat_count:304
offending_line: return _abc_subclasscheck(cls, subclass)
index:#13: file:/usr/local/lib/python3.7/abc.py:135 stat_size:29.6KiB stat_count:245
offending_line: return _abc_register(cls, subclass)
index:#14: file:/usr/local/lib/python3.7/site-packages/cryptography/utils.py:41 stat_size:27.4KiB stat_count:309
offending_line: return property(lambda self: getattr(self, name))
index:#15: file:/usr/local/lib/python3.7/site-packages/prometheus_client/metrics.py:183 stat_size:21.9KiB stat_count:319
offending_line: yield (suffix, dict(series_labels + list(sample_labels.items())), value)
num_other_offenders:3161 combined_stat_size:1626.0KiB
after 36hrs, the container got killed(OOM)
Top 15 lines
total_allocated_size: 6330.0KiB
index:#1: file:/usr/local/lib/python3.7/site-packages/wiji/broker.py:111 stat_size:3376.8KiB stat_count:1
offending_line: self.store[queue_name].append(item)
index:#2: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:92 stat_size:407.8KiB stat_count:2563
offending_line: setattr(conditional_lib, attr, getattr(lib, attr))
index:#3: file:/usr/local/lib/python3.7/site-packages/redis/_compat.py:58 stat_size:320.2KiB stat_count:5
offending_line: return sock.recv(*args, **kwargs)
index:#4: file:/usr/local/lib/python3.7/site-packages/idna/uts46data.py:8204 stat_size:152.2KiB stat_count:80
offending_line: + _seg_78()
index:#5: file:/usr/local/lib/python3.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:90 stat_size:94.7KiB stat_count:1394
offending_line: for attr in dir(lib):
index:#6: file:/usr/local/lib/python3.7/abc.py:126 stat_size:75.2KiB stat_count:388
offending_line: cls = super().__new__(mcls, name, bases, namespace, **kwargs)
index:#7: file:/usr/local/lib/python3.7/inspect.py:2800 stat_size:44.0KiB stat_count:268
offending_line: for param in parameters))
index:#8: file:/usr/local/lib/python3.7/site-packages/prometheus_client/metrics.py:143 stat_size:40.9KiB stat_count:649
offending_line: labelvalues = tuple(unicode(labelkwargs[l]) for l in self._labelnames)
index:#9: file:/usr/local/lib/python3.7/site-packages/wiji/worker.py:354 stat_size:37.8KiB stat_count:62
offending_line: await self.run_task(*task_args, **task_kwargs)
index:#10: file:/usr/local/lib/python3.7/abc.py:127 stat_size:36.1KiB stat_count:261
offending_line: _abc_init(cls)
index:#11: file:/usr/local/lib/python3.7/abc.py:143 stat_size:32.0KiB stat_count:304
offending_line: return _abc_subclasscheck(cls, subclass)
index:#12: file:/usr/local/lib/python3.7/abc.py:135 stat_size:29.6KiB stat_count:245
offending_line: return _abc_register(cls, subclass)
index:#13: file:/usr/local/lib/python3.7/inspect.py:2217 stat_size:28.5KiB stat_count:48
offending_line: sigcls=sigcls)
index:#14: file:/usr/local/lib/python3.7/site-packages/cryptography/utils.py:41 stat_size:27.4KiB stat_count:309
offending_line: return property(lambda self: getattr(self, name))
index:#15: file:/usr/local/lib/python3.7/site-packages/cli/utils/_producer.py:11 stat_size:21.6KiB stat_count:35
offending_line: await task.delay(*args, **kwargs)
num_other_offenders:3194 combined_stat_size:1605.3KiB
I applied the patch: https://github.com/komuw/wiji/blob/d078763f8a96df14457453aa082d9790af6f67b4/wiji/broker.py#L121 https://github.com/komuw/wiji/blob/d078763f8a96df14457453aa082d9790af6f67b4/wiji/broker.py#L153-L169
found in the PR: https://github.com/komuw/wiji/pull/74
and after running benchmarks for 12hrs;
wiji
is humming along nicely at a near constant 42MB
20hrs later and wiji
is still at 42MB
I think we can consider https://github.com/komuw/wiji/pull/74 to be a good enough fix.
journalctl -k | grep -i -e memory -e oom
https://success.docker.com/article/what-causes-a-container-to-exit-with-code-137