First isolated the problems to I/O contention. Then went looking for the contention. I ran an experiment during off hours between two production servers under heavy load. The results:
Idle server load went from 70 to 2.
Server performance taking normal traffic dropped on avg 4x (80+ -> 20)
First isolated the problems to I/O contention. Then went looking for the contention. I ran an experiment during off hours between two production servers under heavy load. The results:
Idle server load went from 70 to 2. Server performance taking normal traffic dropped on avg 4x (80+ -> 20)
Documented here: https://github.com/glg/metadevops-issues/issues/208