mozilla / participation-metrics-org

Participation metrics planning repository
4 stars 4 forks source link

Data analysis software (named Mordred) is eating 59GB of memory #180

Closed canasdiaz closed 5 years ago

canasdiaz commented 6 years ago

The software using the version bitergia/mordred:18.05-02 was eating 59GB or resident memory. This error affects the whole site as the web server is hosted in the same node.

$ free -m
             total       used       free     shared    buffers     cached
Mem:         61566      61241        324         17          0         21
-/+ buffers/cache:      61219        346
Swap:            0          0          0

As soon as I kill the process (docker command was not usable due to memory issues) the memory was released.

$ ps aux|grep mordred
admin     5125  0.0  0.0  14448   164 pts/0    S+   05:44   0:00 grep mordred
admin    31293 10.7 97.5 62564444 61518344 ?   Sl   Jun11  86:00 /usr/bin/python3 /usr/local/bin/mordred -c /home/bitergia/conf/setup.cfg

$ kill -9 31293

$ ps aux|grep mordred
admin     5161  0.0  0.0  14448  2236 pts/0    S+   05:45   0:00 grep mordred

$ free -m
             total       used       free     shared    buffers     cached
Mem:         61566       1148      60417         17          9        124
-/+ buffers/cache:       1015      60550
Swap:            0          0          0
canasdiaz commented 6 years ago

I've disabled the updates until we identify what is happening. We won't run risk during the all-hands week.

canasdiaz commented 6 years ago

The error is here again. It is currently executing the following phases:

No memory errors so far.

canasdiaz commented 6 years ago

Due to this error did not happen again, we are closing the issue.

canasdiaz commented 6 years ago

Bitergia's engineering team is working on this. During the past 24 hours:

This is a WIP.

canasdiaz commented 6 years ago

WIP. PRs to fix this issue in the latest release:

hmitsch commented 6 years ago

The PRs require us to upgrade Kibiter (Kibana 4) to Kibitier 6 (Kibana upgrade, etc).

hmitsch commented 6 years ago

Blocked by #156

canasdiaz commented 5 years ago

During our latest tests we've identified an error pattern which appears when one of the backends is refreshing the identities which were updated during the past two days (this phase is executed when data is published). Our dev team is working on this.

canasdiaz commented 5 years ago

The latest release 18.11-01 did not fix this issue. We'll have to wait until 18.11-02. In the meantime the refresh of the identities is disabled which means that any modification done to Hatstall won't be visible on the dashboard.

hmitsch commented 5 years ago

Targeting deployment early next week (week of NOV 26 2018).

canasdiaz commented 5 years ago

Latest version "grimoirelab-0.2.0" deployed (with the menu upgrade disabled). Waiting for results.

canasdiaz commented 5 years ago

Changes look good.

image

Let's way a couple of days before closing this ticket. It seems we are quite close :nerd_face:

canasdiaz commented 5 years ago

We are done :)