Observed on below branches:
2024.1.3.0, 2024.1.3.1, 2024.1.4.0
High memory consumption has been observed in the transaction locking and conflict resolution functions within the tserver, resulting in a potential risk of OutOfMemory (OOM) errors. Heap snapshots show significant allocations.
Example:
On n2 172.151.22.115, memory usage spiked quickly, causing it to become unreachable.
On n1 172.151.16.194, there was a VM restart before the spike, which ended at 07:55:02.
Memory usage details of node n2, which went into OOM:
7:55:00 AM: 632 MB
7:56:30 AM: 3.31 GB
This is a jump of over ~2.6 GB in 1.5 minutes, all as untracked memory usage on the tserver.
The difference is because of below call stacks, when compared to heap snapshot before increase.
Jira Link: DB-14046
Description
Please find slack thread in JIRA description.
Observed on below branches: 2024.1.3.0, 2024.1.3.1, 2024.1.4.0
High memory consumption has been observed in the transaction locking and conflict resolution functions within the tserver, resulting in a potential risk of OutOfMemory (OOM) errors. Heap snapshots show significant allocations.
Example:
Memory usage details of node n2, which went into OOM:
This is a jump of over ~2.6 GB in 1.5 minutes, all as untracked memory usage on the tserver.
The difference is because of below call stacks, when compared to heap snapshot before increase.
Estimated 1,137,550,736 bytes = ~1.13 GB
Estimated 1,021,223,616 bytes = 1.02 GB
Estimated 360,168,816 bytes = 360 MB
ALL above sums to ~2.51 GB, which is the same as increased memory.
Allocations seems to be coming from wait on conflict. Need more investigation where and why its coming from.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information