Closed n7s closed 7 years ago
maybe the CI steps can include -j4 as the concurrent jobs parameter so it does not try to max out all the available cores?
This sounds like a bug in the CI config, not a smith bug.
Setting the CI config to -j1 I still see multiple xetex instances running in parrallel. And then the build gets cancelled.
more diagnosis and config tweaks needed
this might be completely unrelated but I notice that the configure target detects/lists xetex multiple times, maybe it's the way ret is build in find_program() in waflib/Configure.py ?
I've set the JOBS environment to 4 in the template and automatically thus applied to all Font CI projects on TC. This is honoured by waf/smith and doesn't depend on setting the right flag in the command scripts for multiple steps.
This is confirmed to fix the issue, which was that waf is discovering the hosts physical number of cores, rather than the assigned number of cores as specified by the kernel to the guest (on most systems those two are the same unless changed by tasksel
)
The continued problem nico was seeing after setting -j1
was he'd missed a custom step that only existed in the Padauk project rather than the template, which ran smith pdfs
. I've removed that step since smith alltests
runs that anyway.
We have noticed that smith seriously strains the CI container (to the point of OOM invocation to kill the process). For example, too many instances of xetex are called in parrallel during the tests stage and it looks like the underlying waf max-es out the available cores (like over 20 instances, even when the number is reduced in the container configuration) and the build then gets cancelled because it runs out of memory.
It would be useful to have a way to serialize big jobs (e.g. reduce the amount of xetex instances being launched together).