googleforgames / open-match

Flexible, extensible, and scalable video game matchmaking.
http://open-match.dev
Apache License 2.0
3.15k stars 333 forks source link

Suspected memory leak on frontend. #1762

Open ahmeteser89 opened 3 months ago

ahmeteser89 commented 3 months ago

What happened: Although it is not recommended to use openmatch without a game frontend, first we wanted to test openmatch for our project without a game frontend. We are using openmatch frontend directly from our clients with a simple flow: -> CreateTicket -> GetTicket until there is an assignment. (one per every X seconds) -> DeleteTicket after we got an assignment.

We were able to stabilize the open file descriptors (that are used for connections to redis) and go routines by using production values here, but we could not understand why memory usage of frontend keeps increasing until the pod is killed by kubernetes. It looks like the only thing it does is talking to redis and returning ticket information. Every other metric except memory usage of frontend looks normal.

Appending some graphs below that illustrates the issue better than words. Memory usage of processes (as you can see frontend keeps increasing until the pod is killed): image Open FDs and Go Routines: image Client and Server request rates(as you can see they are pretty much at the same rate): image

What you expected to happen:

We expect the frontend memory usage should be stabilized at some point because there is no other change in any of the other metrics and query rate is the same.

How to reproduce it (as minimally and precisely as possible): Send requests to openmatch frontend from many clients with the flow below: -> CreateTicket -> GetTicket until there is an assignment. (once per every X seconds) -> DeleteTicket after we got an assignment.

Output of kubectl version:

Client Version: v1.28.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3-eks-adc7111

Cloud Provider/Platform (AKS, GKE, Minikube etc.): EKS

Open Match Release Version: 1.8.1

Install Method(yaml/helm): helm

ahmeteser89 commented 3 months ago

We have found the cause of the leak for the frontend, it was telemetry metrics. After we disabled the telemetry.prometheus.enable for frontend memory increase stopped. It might be related to a specific metric of the frontend as it doesnt cause any problems for other components but we did not have time to identify this specific metric. So if anyone encounters this problem disabling metrics by setting telemetry.prometheus.enable to false is worth a shot. Closing this issue...

kt81 commented 3 weeks ago

I have also encountered the same problem, and it seems that the underlying issue has not been resolved. Despite the issue being closed, the concern mentioned remains relevant and requires further attention.

Could you please consider reopening this issue?

ahmeteser89 commented 3 weeks ago

I have also encountered the same problem, and it seems that the underlying issue has not been resolved. Despite the issue being closed, the concern mentioned remains relevant and requires further attention.

Could you please consider reopening this issue?

You are right, as I thought the issue is mostly related with the third party tool used for metrics I closed it but it can still be considered relevant, reopening this issue.

kt81 commented 3 weeks ago

Thank you! ❤️