Closed frasieroh closed 4 months ago
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
topo/node/arista/arista.go | 19 | 20 | 95.0% | ||
<!-- | Total: | 19 | 20 | 95.0% | --> |
Totals | |
---|---|
Change from base Build 8840821359: | 0.06% |
Covered Lines: | 4634 |
Relevant Lines: | 7110 |
As the concerned customer, thanks for this. FWIW, I do have a hack that lets KNE avoid the local wait by launching all pods of a topology in parallel, with obvious concerns about simply shifting the problem to other API's -- in this case, that's what made apparent that the cEOS operator was itself somewhat serializing the work.
/gcbrun
A customer is expressing performance concerns at high scale (~100 instances across ~10 nodes). One of their findings is that cEOS-lab instances appear to start consecutively instead of in parallel.
Because the pod check is baked into
(n *Node) Config
instead of(n *Node) Status
, we don't create the next cEOS-lab custom resource object until the previous pod has started. Now they're created all at once.The new operator version increases the number of reconcilation workers from 1 to
runtime.NumCPU
to cope with this change. It turns out the operator spends most of its time generated self-signed RSA certs, depending on what the runtime does with the worker goroutines there may be performance gains there.Thanks!