Closed mserranom closed 5 years ago
OK to revert for now, but we could possibly tweak some GKE settings to see if that helps --- e.g. autoscale thresholds and availability of CPUs. 10 may also be too high a factor (but just guessing).
I had a chat with @henryoswald about this. CPU and memory look OK in GKE, he suggested we could be slamming a GKE process too hard. We might be requesting worker instances faster than they're cleaned up, which makes sense considering what we've seen in the logs. Removing this concurrent workers (and returning to the original implementation) is a better position to start tweaking GKE I think.
I am not sure if this is what John is suggesting but it is an intersting idea: We currently scale on cpu resources, but GKE lets us scale on custom metrics, such as worker pool size however we have not tried todo this yet.
For a simple change we could tell GKE to run spelling less hot by modifying the cpu threshold , dropping that down to 40% might make a reasonable difference.
I was just thinking about adjusting the CPU threshold, but those are also good ideas 😄
Anyway, no objection to shipping this and going from there.
Removed the concurrent calls to
aspell
within the same request.While this helped to fix certain error scenarios (error rate went down in Linode), errors in GKE actually increased due to unavailability of workers.