Closed ilkercelikyilmaz closed 5 years ago
Hello @ilkercelikyilmaz , this seems the same old issue, but reopened after dep ensure
execution and I suppose was fixed as previously in:
https://github.com/GoogleCloudPlatform/agones/commit/1bdd5a5083c9f732c58f8be9521505b7fb37a764
So please update and rerun your test, at least starting part. 😄
Hi @aLekSer , You are right this fixes the issue. However I got it fixed locally by removing the call WaitForCacheSync. I've tested with the fix only (without removing the WaitForCacheSync). There is no more leak but the GS creation with the WaitForCacheSync and it takes longer under the load test. Do you think we can close the #414 ? Thanks, ilker
Thanks @ilkercelikyilmaz for checking it with updated master.
Regarding this Memory leak
ticket I think you should close this and open new ticket or Pull Request with different description if you see some other problem with WaitForCacheSync()
function.
WaitForCacheSync()
uses WaitFor()
undernearth through next function:
func PollUntil(interval time.Duration, condition ConditionFunc, stopCh <-chan struct{}) error {
return WaitFor(poller(interval, 0), condition, stopCh)
}
So fixing WaitFor helps here also.
Since the cpu/memory leak issue fixed with 1bdd5a5 closing this issue.
I was running a load/stress test to allocated 10K gameservers every 30 minutes (gameservers shutdowns in 10 minutes after allocation). The performance of the Agones was deteriorating with everyone run. It turns out the 10K+ go routines that are created with every run never completes. In about 7 hours around the total number of go routines reached 400K.
The issue id being caused because of WaitForCacheSync call in gameserver creation. Under huge number of calls, cache never syncs and all the go routines continue to wait indefinitely.
This can be related to #414 .
Working on a fix now. Will submit a PR.