Closed graphicore closed 5 years ago
For completeness, but not very useful I think, here are the logs of a failed and terminated "fontbakery-worker" running diffenator with the Vollkorn from above.
The errors E0327 ... backup_poller.cc:124 ...
are interesting but probably a side effect of the pod being under high load. The file src/core/lib/iomgr/ev_epoll1_linux.cc
is from gRPC, so I'm guessing the cache
and persistence
gRPC clients are timing out or something similar.
$ kf logs --previous fontbakery-worker-5b5f68fc48-5kvhd
INFO:FB_WORKER:loglevel: DEBUG
INFO:FB_WORKER:RethinkDB HOST 10.102.163.63 PORT 28015
INFO:FB_WORKER:Waiting for messages in fontbakery-worker...
INFO:FB_WORKER:consuming incoming message ...
DEBUG:FB_WORKER:Got job description: worker_name: "diffenator"
job {
type_url: "type.googleapis.com/fontbakery.dashboard.FamilyJob"
value: "\n@c3fc7cf3423058b61550e6859b56b68773b6c97183e975880f5ea967b69cdf7e\022\211\001\nBc3fc7cf3423058b61550e6859b56b68773b6c97183e975880f5ea967b69cdf7e:0\022@c3fc7cf3423058b61550e6859b56b68773b6c97183e975880f5ea967b69cdf7e\032\0010"
}
INFO:FB_WORKER:consuming a job for: <class 'worker.diffenator.Diffenator'> with <Basic.Deliver(['consumer_tag=ctag1.17caf35750ba4468ae25ca4c2d55ea6f', 'delivery_tag=1', 'exchange=', 'redelivered=True', 'routing_key=fontbakery-worker'])> <BasicProperties(['delivery_mode=2', 'headers={}'])>
DEBUG:FB_WORKER:Files in Tempdir /tmp/tmpb8fraurz: ['/tmp/tmpb8fraurz/after/Vollkorn-SemiBoldItalic.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-Medium.ttf', '/tmp/tmpb8fraurz/after/METADATA.pb', '/tmp/tmpb8fraurz/after/Vollkorn-MediumItalic.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-SemiBold.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-Italic.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-BlackItalic.ttf', '/tmp/tmpb8fraurz/after/DESCRIPTION.en_us.html', '/tmp/tmpb8fraurz/after/OFL.txt', '/tmp/tmpb8fraurz/after/Vollkorn-BoldItalic.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-ExtraBoldItalic.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-Black.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-Regular.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-ExtraBold.ttf', '/tmp/tmpb8fraurz/after/Vollkorn-Bold.ttf', '/tmp/tmpb8fraurz/after/FONTLOG.txt', '/tmp/tmpb8fraurz/before/Vollkorn-SemiBoldItalic.ttf', '/tmp/tmpb8fraurz/before/Vollkorn-SemiBold.ttf', '/tmp/tmpb8fraurz/before/Vollkorn-Italic.ttf', '/tmp/tmpb8fraurz/before/Vollkorn-BlackItalic.ttf', '/tmp/tmpb8fraurz/before/Vollkorn-BoldItalic.ttf', '/tmp/tmpb8fraurz/before/Vollkorn-Black.ttf', '/tmp/tmpb8fraurz/before/Vollkorn-Regular.ttf', '/tmp/tmpb8fraurz/before/Vollkorn-Bold.ttf']
INFO:fontdiffenator:'diff_nametable' 0.58 ms
INFO:fontdiffenator:'diff_attribs' 1.31 ms
INFO:fontdiffenator:Rendering glyph differences. Be patient
E0327 03:13:37.253009780 4331 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553652817.251048301","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553652817.251040999","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:15:12.893402882 4339 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553652912.887556358","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553652912.887544261","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:15:27.675585564 4622 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553652927.675187391","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553652927.675183504","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:16:43.076839745 5474 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653002.039966466","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653001.179768074","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:16:52.492755280 5489 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653012.492391359","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653012.492387460","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:17:30.139855283 5512 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653050.122289256","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653050.122283964","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:17:41.742626392 5523 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653061.662958260","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653061.662942841","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:17:58.572300760 5530 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653078.568873703","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653078.568866882","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:18:13.678178052 6127 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653093.677908218","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653093.677904581","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:21:42.188160347 14672 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653302.186797382","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653302.186785188","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:21:48.269849885 14680 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653308.269311660","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653308.269306133","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:22:23.997368010 14749 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653343.996981124","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653343.996977948","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:23:15.888473375 14780 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653395.880127174","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653395.880118616","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:23:37.906640614 14796 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653417.906054284","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653417.906047807","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:24:38.699788702 20437 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653478.699540482","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653478.699505072","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
E0327 03:25:04.979810059 21640 backup_poller.cc:124] Run client channel backup poller: {"created":"@1553653504.979501630","description":"pollset_work","file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":242,"referenced_errors":[{"created":"@1553653504.979496712","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epoll1_linux.cc","file_line":723,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
INFO:fontdiffenator:'diff_glyphs' 736884.57 ms
INFO:fontdiffenator:'diff_kerning' 7522.01 ms
INFO:fontdiffenator:'diff_metrics' 34.19 ms
INFO:fontdiffenator:'diff_marks' 52778.42 ms
INFO:fontdiffenator:'diff_marks' 2300.87 ms
I did not make the memory hunger go away, but I solved the problem for me, using a laptop with much more memory and creating a minikube instance with 16 GB ram: minikube start --memory=16384
. Now everything works as expected again.
Here's a python memory-profiler
(mprof
) graph I created on the new, not crashing instance. One can see nicely the eight jobs Vollkorn created:
The peaks at the end of each job are the creation of the reports. The plateaus are most probably in the diffenator diff_rendering
which uses harfbuzz and renders and compares most glyphs and thus takes a while.
Peak memory is at ~ 1.2 GB!
The part of the description of the pod that get's killed that hints to what is happening (full description at the bottom of this entry):
The worker is running diffenator, comparing all of the Volkornbrot family.
Exit Code: 137
hints that the container is terminated because it's running out of memory (OOM) described similarly here.So, two paths of action seem worthwhile:
2b involves some work that ideally shouldn't be necessary for this kind of problem–rather for speeding up overall execution time i.e. parallel execution where/when more workers are available. 2a is preferred now if possible.
full pod description: