Open manishtomar opened 7 years ago
This seems to be occurring due to many groups in one tenant. When each of those groups try to converge at same time they cause too many concurrent get requests to CLB which is being throttled. One way to fix is to remove get throttling but I am sure we would've added on CLB team's request. The other solution is to make only one request but return to results all the callers similar to how wait decorator does when authenticating tenant.
I am noticing very high gathering time in prod for many groups. In that particular convergence cycle, the individual requests to upstream services and calls to CASS are very fast (<1ms) but the total time taken is sometimes >300s. The only reason I can think why this would occur is due to throttling but that comes into play (say for CLB) when there are many CLBs in one tenant but that is not the case here from looking at the logs. Need to investigate more.