[Question] Defect-dojo instance crashing after deploying trivy-dojo-report-operator

Luks24 commented 10 months ago

Question

I deployed the trivy-dojo-report-operator and I can see the reports are flowing into defect dojo. The issue is that the defect dojo django pod keeps getting killed. My suspicion is that the request body from the operator is to big and that's why nginx is having issues. The only other thing i have seen in the defect dojo logs is uwsgi error for helthcheck and also a 502 error.

Did anybody also encounter this issue and if so how did you fix it ?

szEvEz commented 10 months ago

Hi,

could you provide some more insight by providing some log messages here please?

As we had trouble with a crashing defectdojo django pod in the past, which was related to issues with the database - can you confirm that your defectdojo database is running without any issues?

Luks24 commented 10 months ago

I didn't see anything in other pods.

some of the logs from the nginx:

│ 10.102.33.247 - - [17/Jan/2024:08:06:24 +0000] "POST /api/v2/reimport-scan/ HTTP/1.1" 502 497 "-" "python-requests/2.31.0" "10.102.1.37"     │
│ 2024/01/17 08:06:24 [warn] 10#10: *156260 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000001372, cli │
│ 2024/01/17 08:06:24 [error] 10#10: *156260 connect() to unix:///run/defectdojo/uwsgi.sock failed (111: Connection refused) while connecting  │
│ 10.102.33.247 - - [17/Jan/2024:08:06:24 +0000] "POST /api/v2/reimport-scan/ HTTP/1.1" 502 497 "-" "python-requests/2.31.0" "10.102.23.208"   │
│ 2024/01/17 08:06:24 [warn] 10#10: *156425 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000001373, cli │
│ 2024/01/17 08:06:24 [error] 10#10: *156425 connect() to unix:///run/defectdojo/uwsgi.sock failed (111: Connection refused) while connecting  │
│ 10.102.24.170 - - [17/Jan/2024:08:06:24 +0000] "POST /api/v2/reimport-scan/ HTTP/1.1" 502 497 "-" "python-requests/2.31.0" "10.102.25.207"   │
│ 2024/01/17 08:06:31 [error] 10#10: *156430 connect() to unix:///run/defectdojo/uwsgi.sock failed (111: Connection refused) while connecting  │
│ 10.102.44.254 - - [17/Jan/2024:08:06:31 +0000] "GET /uwsgi_health HTTP/1.1" 502 497 "-" "kube-probe/1.27+" "-"                               │
│ 2024/01/17 08:06:41 [error] 10#10: *156434 connect() to unix:///run/defectdojo/uwsgi.sock failed (111: Connection refused) while connecting  │
│ 10.102.44.254 - - [17/Jan/2024:08:06:41 +0000] "GET /uwsgi_health HTTP/1.1" 502 497 "-" "kube-probe/1.27+" "-"

from the django container:

 *** WARNING: you are running uWSGI without its master process manager ***                                                                    │
│ your memory page size is 4096 bytes                                                                                                          │
│ detected max file descriptor number: 1048576                                                                                                 │
│ lock engine: pthread robust mutexes                                                                                                          │
│ thunder lock: disabled (you can enable it with --thunder-lock)                                                                               │
│ uWSGI http bound on 0.0.0.0:8081 fd 3                                                                                                        │
│ spawned uWSGI http 1 (pid: 14)                                                                                                               │
│ uwsgi socket 0 bound to UNIX address /run/defectdojo/uwsgi.sock fd 6                                                                         │
│ Python version: 3.11.4 (main, Aug 16 2023, 05:31:52) [GCC 10.2.1 20210110]                                                                   │
│ Python main interpreter initialized at 0x7fd74d0b3558                                                                                        │
│ python threads support enabled                                                                                                               │
│ your server socket listen backlog is limited to 100 connections                                                                              │
│ your mercy for graceful operations on workers is 60 seconds                                                                                  │
│ mapped 1431040 bytes (1397 KB) for 64 cores                                                                                                  │
│ *** Operational MODE: preforking+threaded ***                                                                                                │
│ [17/Jan/2024 08:05:24] INFO [dojo.models:4299] enabling audit logging                                                                        │
│ WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x7fd74d0b3558 pid: 1 (default app)                                             │
│ *** uWSGI is running in multiple interpreter mode ***                                                                                        │
│ spawned uWSGI worker 1 (pid: 1, cores: 8)                                                                                                    │
│ spawned uWSGI worker 2 (pid: 15, cores: 8)                                                                                                   │
│ spawned uWSGI worker 3 (pid: 18, cores: 8)                                                                                                   │
│ spawned uWSGI worker 4 (pid: 26, cores: 8)                                                                                                   │
│ spawned uWSGI worker 5 (pid: 32, cores: 8)                                                                                                   │
│ spawned uWSGI worker 6 (pid: 41, cores: 8)                                                                                                   │
│ spawned uWSGI worker 7 (pid: 51, cores: 8)                                                                                                   │
│ spawned uWSGI worker 8 (pid: 59, cores: 8)                                                                                                   │
│ Stream closed EOF for defect-dojo/defect-dojo-defectdojo-django-6c8977f89-p25zl (uwsgi)

rndmh3ro commented 10 months ago

Could it be that yo defectDojo instance is out of memory/cpu? From the logs I can see that there are several imports at the same second, so maybe DD gets OOM-killed?

If that's the case, then maybe we need rate-limiting.

Luks24 commented 10 months ago

Sorry for the late reply. I increased it once before but it seems it was not enough. Bumping the uwsgi to 4Gi memory did the trick. The uswgi container is now steadily consuming about 3Gi of memory.

Thank you both for the help.

telekom-mms / trivy-dojo-report-operator

[Question] Defect-dojo instance crashing after deploying trivy-dojo-report-operator #45

Question