Closed Shikugawa closed 3 years ago
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Shikugawa
To complete the pull request process, please assign incfly after the PR has been reviewed.
You can assign the PR to them by writing /assign @incfly
in a comment when ready.
The full list of commands accepted by this bot can be found here.
I read the gRPC doc for the async C++ impl. https://grpc.io/docs/languages/cpp/async/
And here is the sample server, with their Process
implementation
https://github.com/grpc/grpc/blob/v1.41.0/examples/cpp/helloworld/greeter_async_server.cc#L86
You can see official example handle the event as CREATE, PROCESS, FINISH
, and only create a new instance of the processing state with new
when status is PROCESS
branch. Because at that point, the current instance of the state is assigned to handle the current request.
However, in our code, processingstate::process is always invoking parent_.create()
to new an instance. This means when the state is either CREATE
or PROCESS
we will create a new instance of process.
Compared with official e.g, we create more instances.
Therefore I think that's the root cause of the OOM.
You can see official example handle the event as CREATE, PROCESS, FINISH, and only create a new instance of the processing state with new when status is PROCESS branch. Because at that point, the current instance of the state is assigned to handle the current request.
I don't think so. It is because our code and official example has completely the same semantics in state processing. It seems that we have difference of CREATE state. But we have already done CREATE handling on the constructor of ProcessingState.
However, in our code, processingstate::process is always invoking parent_.create() to new an instance. This means when the state is either CREATE or PROCESS we will create a new instance of the process.
In our code, the problem that you said won't occur. Let's consider the allocation sequence of processing 2 requests. The graph attached below is the tree relationship of the processing/completion state of our implementation.
First, The ProcessingState (1)
was created here. In this constructor, it will invoke CheckRequest
same as CREATE
of hello_greeter_example
.
Second, Receive a request. It will create ProcessingState (2)
that will be used to handle the next request and CompletionState (1)
.
Third. CompletionState (1)
will release ProcessingState (1)
and ProcessingState (V2)
and self instance. It seems ProcessingState (2)
is dangling. But it will be used on the next request.
This is why our code shouldn't be leaked.
Sorry. The graph above has the wrong point. (1') processing state V2
that created at the initial phase won't be released after completion_state_->Proceed()
called. (But It is not related with OOM of course)
While it's still unclear why this PR (having v2 v3 pointer initialized and delete all the v2 v3 pointers) solves the problem, I am able to verify this does fix the OOM issue... Merging for now.
now even after undrestanding the memory allocation deletion flow, main question is still either v2 or v3, completionState does initiates the pointers correctly and the CompleteState::Proceed
check if (v2/v3)
will ensure that corresponding v2 v3 object will be deleted.
The reason for the OOM was that the ProcessingState pointer passed to CompletionState was not released properly. As a result of the fix, the memory pressure is as follows The load test was done as follows for bookinfo deployed in GKE.
before
after