webserver-llc / angie

Angie - drop-in replacement for Nginx
https://en.angie.software
BSD 2-Clause "Simplified" License
1.24k stars 65 forks source link

core dumped error after upgrading 1.5.2 => 1.6.0 #90

Closed revengel closed 3 months ago

revengel commented 4 months ago

help me figure out the problem

VBart commented 4 months ago

Could you please provide the full configuration (angie -T)?

revengel commented 4 months ago

Could you please provide the full configuration (angie -T)?

Yes It's here https://gist.github.com/revengel/584c109556c3448f9cdb542f46c03524

VBart commented 4 months ago

Could you please provide the full configuration (angie -T)?

Yes It's here https://gist.github.com/revengel/584c109556c3448f9cdb542f46c03524

Thanks. I see nothing suspicious... Could you also enabled debug logging and collect one with an error?

revengel commented 4 months ago

debug logs here

2024/07/01 23:29:04 [alert] 1#1: worker process 10 exited on signal 11 (core dumped)
2024/07/01 23:29:04 [notice] 1#1: start worker process 11
2024/07/01 23:29:04 [alert] 11#11: setpriority(-5) failed (13: Permission denied)
2024/07/01 23:29:04 [debug] 11#11: epoll add event: fd:43 op:1 ev:00002001
2024/07/01 23:29:04 [debug] 11#11: epoll add event: fd:44 op:1 ev:00002001
2024/07/01 23:29:04 [debug] 11#11: acme status wgui_example_com: certificate scheduled for renewal on Wed Jul  3 00:16:48 2024
2024/07/01 23:29:04 [debug] 11#11: accept on 0.0.0.0:443, ready: 0
2024/07/01 23:29:04 [debug] 11#11: posix_memalign: 00007F76038E3800:512 @16
2024/07/01 23:29:04 [debug] 11#11: *7 accept: 172.18.0.1:49498 fd:3
2024/07/01 23:29:04 [debug] 11#11: *7 event timer add: 3: 60000:17078525
2024/07/01 23:29:04 [debug] 11#11: *7 reusable connection: 1
2024/07/01 23:29:04 [debug] 11#11: *7 epoll add event: fd:3 op:1 ev:80002001
2024/07/01 23:29:04 [debug] 11#11: *7 http check ssl handshake
2024/07/01 23:29:04 [debug] 11#11: *7 http recv(): 1
2024/07/01 23:29:04 [debug] 11#11: *7 https ssl handshake: 0x16
2024/07/01 23:29:04 [debug] 11#11: *7 tcp_nodelay
2024/07/01 23:29:04 [debug] 11#11: *7 reusable connection: 0
2024/07/01 23:29:04 [debug] 11#11: *7 SSL server name: "dav.example.com"
2024/07/01 23:29:04 [debug] 11#11: *7 posix_memalign: 00007F7603866320:4096 @16
2024/07/01 23:29:04 [debug] 11#11: *7 posix_memalign: 00007F7603867560:4096 @16
2024/07/01 23:29:04 [notice] 1#1: signal 17 (SIGCHLD) received from 11
2024/07/01 23:29:04 [alert] 1#1: worker process 11 exited on signal 11 (core dumped)
2024/07/01 23:29:04 [notice] 1#1: start worker process 12
2024/07/01 23:29:04 [alert] 12#12: setpriority(-5) failed (13: Permission denied)
2024/07/01 23:29:04 [debug] 12#12: epoll add event: fd:43 op:1 ev:00002001
2024/07/01 23:29:04 [debug] 12#12: epoll add event: fd:44 op:1 ev:00002001
2024/07/01 23:29:04 [debug] 12#12: acme status wgui_example_com: certificate scheduled for renewal on Wed Jul  3 00:16:48 2024
2024/07/01 23:29:04 [debug] 12#12: accept on 0.0.0.0:443, ready: 0
2024/07/01 23:29:04 [debug] 12#12: posix_memalign: 00007F76038E3800:512 @16
2024/07/01 23:29:04 [debug] 12#12: *8 accept: 172.18.0.1:49502 fd:3
2024/07/01 23:29:04 [debug] 12#12: *8 event timer add: 3: 60000:17078677
2024/07/01 23:29:04 [debug] 12#12: *8 reusable connection: 1
2024/07/01 23:29:04 [debug] 12#12: *8 epoll add event: fd:3 op:1 ev:80002001
2024/07/01 23:29:04 [debug] 12#12: *8 http check ssl handshake
2024/07/01 23:29:04 [debug] 12#12: *8 http recv(): 1
2024/07/01 23:29:04 [debug] 12#12: *8 https ssl handshake: 0x16
2024/07/01 23:29:04 [debug] 12#12: *8 tcp_nodelay
2024/07/01 23:29:04 [debug] 12#12: *8 reusable connection: 0
2024/07/01 23:29:04 [debug] 12#12: *8 SSL server name: "caldav.example.com"
2024/07/01 23:29:04 [debug] 12#12: *8 posix_memalign: 00007F7603866320:4096 @16
2024/07/01 23:29:04 [debug] 12#12: *8 posix_memalign: 00007F7603867560:4096 @16
2024/07/01 23:29:04 [notice] 1#1: signal 17 (SIGCHLD) received from 12
VBart commented 4 months ago

Is it possible to get a core dump from that container?

VBart commented 4 months ago

Do you need any assistance to extract a core dump? You can write directly to me in Telegram @VBart or via an email vbart@wbsrv.ru. We are very curious in debugging this issue, but unfortunately neither config, nor debug log gives any hits here.

revengel commented 4 months ago

Do you need any assistance to extract a core dump? You can write directly to me in Telegram @VBart or via an email vbart@wbsrv.ru. We are very curious in debugging this issue, but unfortunately neither config, nor debug log gives any hits here.

Yes Please help collect these data (core dump) I run it in docker via docker compose tool Can you please provide simple instruction?

a-sor commented 4 months ago

I've tried to reproduce the error using your configuration on our Angie 1.6.0 docker image, but to no avail.

docker

I don't know how you are using docker compose, so I can't give you precise instructions on extracting the core file. The idea is to mount a local directory into your container (e.g. using -v $(pwd):/shared) and run docker in interactive mode with a terminal (using -it). Then you should start Angie manually (e.g. # angie -g 'daemon off;'), and when you get a core dump, you can just copy it to your local directory (e.g. # cp core.9 /shared). With our Alpine docker image, core dumps are just created in the current directory and I didn't even have to configure that.

gun4A commented 3 months ago

_usr_sbin_angie-nodebug.33.crash[1].gz

I had the same problem... Downgraded back to 1.5.2 !
On v 1.6 every worker process had "core dumped" : (

Ubuntu 22.04.4 LTS, / i9-12900K

a-sor commented 3 months ago

Hi @gun4A, Thanks a lot for your input. The source of the bug has been identified, we will fix it in the next release.

a-sor commented 3 months ago

@gun4A, here's a patch that fixes this, in case you wish to test. fix_acme_crashes.patch.gz

VBart commented 3 months ago

Could you please provide the full configuration (angie -T)?

Yes It's here https://gist.github.com/revengel/584c109556c3448f9cdb542f46c03524

The issue fixed by the patch above will manifest itself only with more than 4 acme_client directives configured. But in the config provided by this link there is only one acme_client directive.

@revengel are you sure that you've provided the full configuration previously?

revengel commented 3 months ago

@VBart There are more than 4 acme_client directives in my config. The config in the gist is really not quite complete. I cut out the excess for the sake of compactness.

gun4A commented 3 months ago

@gun4A, here's a patch that fixes this, in case you wish to test. fix_acme_crashes.patch.gz

patching file ngx_http_acme_module.c Hunk #4 succeeded at 4253 (offset -4 lines). Hunk #5 succeeded at 4508 (offset -4 lines). Hunk #6 succeeded at 5157 (offset -8 lines). Hunk #7 succeeded at 5169 (offset -8 lines). Hunk #8 FAILED at 5188. Hunk #9 succeeded at 5212 (offset -9 lines). 1 out of 9 hunks FAILED -- saving rejects to file ngx_http_acme_module.c.rej

Contents of ngx_http_acme_module.c.rej:

--- ngx_http_acme_module.c Mon Jul 29 11:53:51 2024 +0300 +++ ngx_http_acme_module.c Mon Jul 29 14:29:23 2024 +0300 @@ -5188,13 +5188,11 @@ return cli; }

- cli = ngx_array_push(&amcf->clients); + cli = ngx_pcalloc(cf->pool, sizeof(ngx_acme_client_t)); if (cli == NULL) { return NULL; }

- ngx_memzero(cli, sizeof(ngx_acme_client_t)); - cli->log = cf->log; cli->name = *name; cli->enabled = NGX_CONF_UNSET_UINT;

VBart commented 3 months ago

@gun4A, here's a patch that fixes this, in case you wish to test. fix_acme_crashes.patch.gz

patching file ngx_http_acme_module.c Hunk #4 succeeded at 4253 (offset -4 lines). Hunk #5 succeeded at 4508 (offset -4 lines). Hunk #6 succeeded at 5157 (offset -8 lines). Hunk #7 succeeded at 5169 (offset -8 lines). Hunk #8 FAILED at 5188. Hunk #9 succeeded at 5212 (offset -9 lines). 1 out of 9 hunks FAILED -- saving rejects to file ngx_http_acme_module.c.rej

Contents of ngx_http_acme_module.c.rej:

--- ngx_http_acme_module.c Mon Jul 29 11:53:51 2024 +0300 +++ ngx_http_acme_module.c Mon Jul 29 14:29:23 2024 +0300 @@ -5188,13 +5188,11 @@ return cli; }

  • cli = ngx_array_push(&amcf->clients); + cli = ngx_pcalloc(cf->pool, sizeof(ngx_acme_client_t)); if (cli == NULL) { return NULL; }

  • ngx_memzero(cli, sizeof(ngx_acme_client_t)); - cli->log = cf->log; cli->name = *name; cli->enabled = NGX_CONF_UNSET_UINT;

The patch was against the latest revision at the moment. Here's a version of the patch against 1.6.0 release: fix_acme_crashes_v1.6.0.patch.gz

a-sor commented 3 months ago

Arrgh, sorry for the wrong patch. I forgot that people aren't using our latest version :smile:

VBart commented 3 months ago

Fixed by https://github.com/webserver-llc/angie/commit/cfd01492f3db4a349cff4d703bdc7439d15bc2df (Angie 1.6.1).