litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.39k stars 687 forks source link

ChaosExperiment: Memory Leak #2948

Open ojaswa1942 opened 3 years ago

ojaswa1942 commented 3 years ago

The idea is to imitate memory leaks in applications.

How is it different from pod-memory-hog? As opposed to pod-memory-hog, where stressors show a sudden spike in memory for the specified duration, pod-memory-leak can gradually increase memory consumption (in steps) over the specified period of time to reach the upper limit.

Can it be an enhancement to pod-memory-hog?
Well, maybe. Since stress-ng does not directly support a gradual stressor (AFAIK), the implementation may differ - and it might be a better idea to introduce this as an independent experiment.

As in natural memory leaks, the default duration may also vary (longer).

Would love to take this one myself!

ksatchit commented 3 years ago

@ojaswa1942 , this makes sense. Would love to see what options we have here!

ojaswa1942 commented 3 years ago

Jotting down some pointers in regards to stress-ng for everyone's reference:

A program/script/binary to do that seems like the way to go.

ksatchit commented 3 years ago

Thanks for sharing @ojaswa1942 - have assigned the issue to you!

ojaswa1942 commented 3 years ago

Here's what I'm planning to do: A C++ script to malloc target memory gradually over the total time. Let's say total memory = 600MB, total time = 60s. The script will allocate & fill up memory at a rate of 10MB/sec for the next 60s to reach the total target memory and exit.

This can be bundled as a binary and be a part of the helper pod itself.

Suggestions appreciated!

neelanjan00 commented 1 year ago

Hello @ojaswa1942, do you have any update on this?

ojaswa1942 commented 1 year ago

I've had several discussions with multiple members on Slack at the time. The implementation was blocked because of an underlying issue, which at that time was causing chaos pods to fail. I don't remember the exact cause, but it had something to do with namespaces, where it was picking up the incorrect namespace and eventually failing.

It has been a while, and the issue might have been fixed now, I did notice an overhaul of some components post that, I can give it a try again to see if it still persists.