Closed Punit-Solanki closed 2 months ago
/assign @LoveEachDay can you help on investigating on this issue?
@Punit-Solanki You'd better add more memory to pulsar recovery.
Try to change the configmap for pulsar recovery, change the following configurations:
BOOKIE_MEM: |
-Xms64m -Xmx64m -XX:MaxDirectMemorySize=1024m
then restart pulsar recovery pod.
Hi @LoveEachDay Thank you for your response!
I have added the value to our configmap
. Request you to keep the issue open for the next couple of days as our cluster updates take place over the weekend.
I'll notify here if it works! Thanks again!
Thank you so much @LoveEachDay
This worked for me. I don't see pulsar recovery ingesting unnecessary logs now.
Just one request, can you let me know what was the exact reason for this ingestion? Also, what did adding MaxDirectMemory to the configmap do here?
@Punit-Solanki Pulsar recovery monitors the ledger replication status periodically. If it detects an under-replicated ledger, it triggers a replication from one bookie to another. During this process, Pulsar recovery uses direct memory to read ledger data from the source bookie and write it to the target bookie.
You may proceed to close this. Thank you so much @LoveEachDay
Is there an existing issue for this?
Environment
Current Behavior
When a GKE cluster update takes place, all the pods are recreated. This is fine. However, we have observed a weird behavior.
If the milvus-pulsar-recovery-0 pod gets into "running" state before zookeeper or bookie pods get in a "running" state, the pulsar recovery pod starts ingesting tons of logs with the error:
"15:16:01.153 [bookkeeper-io-3-4] ERROR org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl - Unable to allocate memory"
During this time frame, there are no problems with the application. Everything is working fine. The log ingestion is a big problem. We are getting heavily billed for the log analytics workspace.
We have a temporary fix which we have implemented. When we manually delete the milvus-pulsar-recovery-0 pod and the pod is recreated, the error is resolved and log ingestion stops immediately.
What we have tried for far:
We would also like to point out that the utilization isn't exceeding the allocated size. Despite that, we doubled the MaxDirectMemorySize, and still we are seeing the same error:
"[bookkeeper-io-3-4] ERROR org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl - Unable to allocate memory"
Expected Behavior
Logs should not be ingested indefinitely when AKS cluster update takes place. Additionally, if it is an error with our configuration, the ingestion and errors should not stop when we recreate the pulsar recovery pod.
Steps To Reproduce
Milvus Log
No response
Anything else?
No response