nginxinc / docker-nginx

Official NGINX Dockerfiles
BSD 2-Clause "Simplified" License
3.25k stars 1.73k forks source link

Segfault during requests with 1.25.2 - LUA issue #845

Closed EugenMayer closed 7 months ago

EugenMayer commented 12 months ago

Environment

Bug

When building our image including ndk/lua/brotli with 1.25.2 as baseline, most of our XHR calls return an empty response with an crash / coredump.

Details

To be more of a help, i tried to get the core dump. I have it, but i miss the symbols (starting with nginx-debug). I could not find any hints/docs how it is intended to get the symbols for gdb backtrace so i could share any more useful information here.

Can someone hint me to the docs or explain how to get the symbols (without recompiling nginx myself)?

EugenMayer commented 12 months ago

Confirmed that using nginx:1.25.1-bookworm as the baseline image works, no segfaults happen.

Using nginx:mainline-bookworm causes worker crashes on most xhr calls.

Crawling through the we i found that 'non matching nginx modules' could cause this. We indeed use ndk/lua/brotli in our stack which we add like described in the docs using this dockerfile

ARG FROM_IMAGE=nginx:mainline-bookworm

# this is what we use to build the modules
# this has been copied from https://github.com/nginxinc/docker-nginx/blob/master/modules/Dockerfile#L5
# also see https://github.com/nginxinc/docker-nginx/blob/master/modules/README.md
# and is using ENABLED_MODULES as ENV (changed from ARG)
FROM $FROM_IMAGE as builder

# This is how we select the modules to select. Be sure to symlink them later (way below in our HACK)
# we want ndk/lua for lua support, brotli for brotli compression
ENV ENABLED_MODULES="ndk lua brotli"

#### PURE COPY FROM GIT ############
RUN set -ex \
    && if [ "$ENABLED_MODULES" = "" ]; then \
        echo "No additional modules enabled, exiting"; \
        exit 1; \
    fi

COPY ./ /modules/

RUN set -ex \
    && apt update \
    && apt install -y --no-install-suggests --no-install-recommends \
                patch make wget mercurial devscripts debhelper dpkg-dev \
                quilt lsb-release build-essential libxml2-utils xsltproc \
                equivs git g++ libparse-recdescent-perl \
    && XSLSCRIPT_SHA512="f7194c5198daeab9b3b0c3aebf006922c7df1d345d454bd8474489ff2eb6b4bf8e2ffe442489a45d1aab80da6ecebe0097759a1e12cc26b5f0613d05b7c09ffa *stdin" \
    && wget -O /tmp/xslscript.pl https://hg.nginx.org/xslscript/raw-file/01dc9ba12e1b/xslscript.pl \
    && if [ "$(cat /tmp/xslscript.pl | openssl sha512 -r)" = "$XSLSCRIPT_SHA512" ]; then \
        echo "XSLScript checksum verification succeeded!"; \
        chmod +x /tmp/xslscript.pl; \
        mv /tmp/xslscript.pl /usr/local/bin/; \
    else \
        echo "XSLScript checksum verification failed!"; \
        exit 1; \
    fi \
    && hg clone -r ${NGINX_VERSION}-${PKG_RELEASE%%~*} https://hg.nginx.org/pkg-oss/ \
    && cd pkg-oss \
    && mkdir /tmp/packages \
    && for module in $ENABLED_MODULES; do \
        echo "Building $module for nginx-$NGINX_VERSION"; \
        if [ -d /modules/$module ]; then \
            echo "Building $module from user-supplied sources"; \
            # check if module sources file is there and not empty
            if [ ! -s /modules/$module/source ]; then \
                echo "No source file for $module in modules/$module/source, exiting"; \
                exit 1; \
            fi; \
            # some modules require build dependencies
            if [ -f /modules/$module/build-deps ]; then \
                echo "Installing $module build dependencies"; \
                apt update && apt install -y --no-install-suggests --no-install-recommends $(cat /modules/$module/build-deps | xargs); \
            fi; \
            # if a module has a build dependency that is not in a distro, provide a
            # shell script to fetch/build/install those
            # note that shared libraries produced as a result of this script will
            # not be copied from the builder image to the main one so build static
            if [ -x /modules/$module/prebuild ]; then \
                echo "Running prebuild script for $module"; \
                /modules/$module/prebuild; \
            fi; \
            /pkg-oss/build_module.sh -v $NGINX_VERSION -f -y -o /tmp/packages -n $module $(cat /modules/$module/source); \
            BUILT_MODULES="$BUILT_MODULES $(echo $module | tr '[A-Z]' '[a-z]' | tr -d '[/_\-\.\t ]')"; \
        elif make -C /pkg-oss/debian list | grep -P "^$module\s+\d" > /dev/null; then \
            echo "Building $module from pkg-oss sources"; \
            cd /pkg-oss/debian; \
            make rules-module-$module BASE_VERSION=$NGINX_VERSION NGINX_VERSION=$NGINX_VERSION; \
            mk-build-deps --install --tool="apt-get -o Debug::pkgProblemResolver=yes --no-install-recommends --yes" debuild-module-$module/nginx-$NGINX_VERSION/debian/control; \
            make module-$module BASE_VERSION=$NGINX_VERSION NGINX_VERSION=$NGINX_VERSION; \
            find ../../ -maxdepth 1 -mindepth 1 -type f -name "*.deb" -exec mv -v {} /tmp/packages/ \;; \
            BUILT_MODULES="$BUILT_MODULES $module"; \
        else \
            echo "Don't know how to build $module module, exiting"; \
            exit 1; \
        fi; \
    done \
    && echo "BUILT_MODULES=\"$BUILT_MODULES\"" > /tmp/packages/modules.env

FROM $FROM_IMAGE

COPY --from=builder /tmp/packages /tmp/packages
RUN set -ex \
    && apt update \
    && . /tmp/packages/modules.env \
    && for module in $BUILT_MODULES; do \
           apt install --no-install-suggests --no-install-recommends -y /tmp/packages/nginx-module-${module}_${NGINX_VERSION}*.deb; \
       done \
    && rm -rf /tmp/packages \
    && rm -rf /var/lib/apt/lists/
#### PURE COPY FROM GIT ############

################# HACK ###########
# align the docker-hub image with our debian based expectations
RUN rm -f /etc/nginx/conf.d/default.conf \
   && mkdir -p /etc/nginx/sites-enabled /etc/nginx/modules-enabled \
   # Load all the modules you want / compiled
   # the order for LUA is important, NDK first!
   && echo "load_module modules/ndk_http_module.so;" > /etc/nginx/modules-enabled/00_ndk_http_module.conf \
   && echo "load_module modules/ngx_http_lua_module.so;" > /etc/nginx/modules-enabled/01_ngx_http_lua_module.conf \
   && echo "load_module modules/ngx_http_image_filter_module.so;" > /etc/nginx/modules-enabled/ngx_http_image_filter_module.conf \
   && echo "load_module modules/ngx_http_xslt_filter_module.so;" > /etc/nginx/modules-enabled/ngx_http_xslt_filter_module.conf \
   && echo "load_module modules/ngx_http_js_module.so;" > /etc/nginx/modules-enabled/ngx_http_js_module.conf \
   && echo "load_module modules/ngx_http_brotli_filter_module.so;" > /etc/nginx/modules-enabled/ngx_http_brotli_filter_module.conf \
   && echo "load_module modules/ngx_http_brotli_static_module.so;" > /etc/nginx/modules-enabled/ngx_http_brotli_static_module.conf

So we use the script provided under https://github.com/nginxinc/docker-nginx/blob/master/modules/Dockerfile#L5 - and we used it before for about 6 months without an issue or changes. It broke with the recent 1.25.2 upgrade on mainline. Maybe some sources are no longer matching?

I would love to provide a proper core-dump but yet fail to get the header for the symbols

thresheek commented 12 months ago

Hi @EugenMayer !

Regarding the debug symbols, you need to add proper sources list entry to a Dockerfile, https://github.com/nginxinc/docker-nginx/blob/master/mainline/debian/Dockerfile#L47, and install nginx-dbg package. This will provide debug symbols for nginx.

thresheek commented 12 months ago

You also probably want to edit the apt install --no-install-suggests --no-install-recommends -y /tmp/packages/nginx-module-${module}_${NGINX_VERSION}*.deb; \ line to include dbg-versions of the modules built, too.

EugenMayer commented 11 months ago

Setting up my debug env like this. On the host running docker

echo "/tmp/cores/core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern
sudo sysctl -w fs.suid_dumpable=2
ulimit -c unlimited

in the container

vi /etc/nginx/nginx.conf
# add this at the top
worker_rlimit_core  1000M;
working_directory /tmp/cores;

# install headers and gdb
echo "deb [signed-by=/usr/share/keyrings/nginx-archive-keyring.gpg] https://nginx.org/packages/mainline/debian/ bookworm nginx" >> /etc/apt/sources.list.d/nginx.list
apt update && apt install -y nginx-dbg gbd

# create core dump storage
mkdir -p /tmp/cores
chown root:root /tmp/cores
chmod 1777 /tmp/cores

nginx-debug

Now when i open the core-dump with gdb

New LWP 2395]
Core was generated by `nginx: worke'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000056089d4535e8 in ?? ()
(gdb) backtrace
#0  0x000056089d4535e8 in ?? ()
#1  0x000056089f514710 in ?? ()
#2  0x000056089f18e560 in ?? ()
#3  0x0000000000000040 in ?? ()
#4  0x000056089f25f0c8 in ?? ()
#5  0x000056089f51bae8 in ?? ()
#6  0x000056089d454fe4 in ?? ()
#7  0x000056089f18eb38 in ?? ()
#8  0x000056089f18e560 in ?? ()
#9  0x000056089f1d71c0 in ?? ()
#10 0x000056089d4587c5 in ?? ()
#11 0x000056089f517960 in ?? ()
#12 0x000056089f18eb38 in ?? ()
#13 0x000056089f18e560 in ?? ()
#14 0x000056089d49d705 in ?? ()
#15 0x000056089f18e560 in ?? ()
#16 0x000056089f517870 in ?? ()
#17 0x000056089d4f6820 in ?? ()
#18 0x000056089d4416c3 in ?? ()
#19 0x000056089f517858 in ?? ()
#20 0x000056089f18e560 in ?? ()
#21 0x000056089f517858 in ?? ()
#22 0x000056089d43d0fd in ?? ()
#23 0x000056089f1d6b38 in ?? ()
#24 0x000056089f18e560 in ?? ()
#25 0x000056089f51bae8 in ?? ()
#26 0x000056089d446486 in ?? ()
#27 0x000056089f5146c0 in ?? ()
#28 0x000056089f514710 in ?? ()
#29 0x000056089f54a130 in ?? ()

So still no symbols at all - the entire stacktrace.

dpkg -l | grep nginx
ii  nginx                        1.25.2-1~bookworm              amd64        high performance web server
ii  nginx-dbg                    1.25.2-1~bookworm              amd64        nginx debug symbols
ii  nginx-module-brotli          1.25.2+1.0.0-1~bookworm        amd64        nginx 3rd-party brotli compression dynamic modules
ii  nginx-module-geoip           1.25.2-1~bookworm              amd64        nginx GeoIP dynamic modules
ii  nginx-module-image-filter    1.25.2-1~bookworm              amd64        nginx image filter dynamic module
ii  nginx-module-lua             1.25.2+0.10.25-1~bookworm      amd64        nginx 3rd-party Lua dynamic modules
ii  nginx-module-ndk             1.25.2+0.3.2-1~bookworm        amd64        nginx 3rd-party NDK dynamic module
ii  nginx-module-njs             1.25.2+0.8.0-1~bookworm        amd64        nginx njs dynamic modules
ii  nginx-module-xslt            1.25.2-1~bookworm              amd64        nginx xslt dynamic module
EugenMayer commented 11 months ago

You also probably want to edit the apt install --no-install-suggests --no-install-recommends -y /tmp/packages/nginx-module-${module}_${NGINX_VERSION}*.deb; \ line to include dbg-versions of the modules built, too.

did that by adding

           apt install --no-install-suggests --no-install-recommends -y /tmp/packages/nginx-module-${module}-dbg_${NGINX_VERSION}*.deb; \

(honestly i think it should be a lot easier to set this up for anybody that wants to provide a core dump. Setting this up without any docs at all seems to be very inefficient)

thresheek commented 11 months ago

We never encountered a need to debug coredumps produced inside a container in a container yet, so that would be a first.

It looks like you might also need to install more debug packages, like libc6-dbg.

How do you launch gdb? Do you provide a path to proper binary, e.g. gdb --core=/tmp/cores/core /usr/sbin/nginx-debug ?

EugenMayer commented 11 months ago

Was launching using

gdb /usr/bin/nginx-debug /tmp/cores/core.nginx-debug.5546

And here we go .. not /usr/sbin but /usr/bin .. sorry, must have overlooked it when whiching it. Fixing that now and installing libc6-dbg and report back.

Thank you for being so responsive and helpful!

EugenMayer commented 11 months ago

Here we go, using gdb --core=/tmp/cores/core.nginx-debug.5546 /usr/sbin/nginx-debug

i'am finally gettting somewhere

#0  ngx_http_variable_headers_internal (r=0x5645607db210, v=0x56456086ee98, data=<optimized out>, sep=44 ',') at src/http/ngx_http_variables.c:840
840 src/http/ngx_http_variables.c: No such file or directory.
(gdb) backtrace
#0  ngx_http_variable_headers_internal (r=0x5645607db210, v=0x56456086ee98, data=<optimized out>, sep=44 ',') at src/http/ngx_http_variables.c:840
#1  0x000056455f9cafe4 in ngx_http_get_indexed_variable (r=0x5645607db210, index=<optimized out>) at src/http/ngx_http_variables.c:637
#2  0x000056455f9cb0d7 in ngx_http_get_flushed_variable (r=<optimized out>, index=<optimized out>) at src/http/ngx_http_variables.c:674
#3  0x000056455f9ce7c5 in ngx_http_script_var_code (e=0x5645607db7e8) at src/http/ngx_http_script.c:1879
#4  0x000056455fa13705 in ngx_http_rewrite_handler (r=0x5645607db210) at src/http/modules/ngx_http_rewrite_module.c:180
#5  0x000056455f9b76c3 in ngx_http_core_rewrite_phase (r=0x5645607db210, ph=0x564560b64620) at src/http/ngx_http_core_module.c:929
#6  0x000056455f9b30fd in ngx_http_core_run_phases (r=0x5645607db210) at src/http/ngx_http_core_module.c:875
#7  0x000056455f9bc486 in ngx_http_run_posted_requests (c=0x564560b68898) at src/http/ngx_http_request.c:2457
#8  0x000056455f9bfc6f in ngx_http_process_request_line (rev=0x564560b96ee0) at src/http/ngx_http_request.c:1196
#9  0x000056455f98a77b in ngx_epoll_process_events (cycle=0x564560796e90, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#10 0x000056455f97dfe6 in ngx_process_events_and_timers (cycle=cycle@entry=0x564560796e90) at src/event/ngx_event.c:248
#11 0x000056455f98779d in ngx_worker_process_cycle (cycle=cycle@entry=0x564560796e90, data=data@entry=0x2) at src/os/unix/ngx_process_cycle.c:721
#12 0x000056455f985e63 in ngx_spawn_process (cycle=cycle@entry=0x564560796e90, proc=proc@entry=0x56455f987730 <ngx_worker_process_cycle>, data=data@entry=0x2, name=name@entry=0x56455fa659f3 "worker process", 
    respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#13 0x000056455f987008 in ngx_start_worker_processes (cycle=cycle@entry=0x564560796e90, n=16, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#14 0x000056455f9884f9 in ngx_master_process_cycle (cycle=0x564560796e90) at src/os/unix/ngx_process_cycle.c:130
#15 0x000056455f95a8b7 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:384

Btw, using gdb /usr/sbin/nginx-debug /tmp/cores/core.nginx-debug.5546 is entirely wrong --core is mandatory or the binary path is ignore. So there were 2 mistakes.

Anything you can see here?

EugenMayer commented 11 months ago

840 src/http/ngx_http_variables.c: No such file or directory. seems off to me?

For reference, i know have (hopefully) all the headers

dpkg --list | grep nginx
ii  nginx                        1.25.2-1~bookworm              amd64        high performance web server
ii  nginx-dbg                    1.25.2-1~bookworm              amd64        nginx debug symbols
ii  nginx-module-brotli          1.25.2+1.0.0-1~bookworm        amd64        nginx 3rd-party brotli compression dynamic modules
ii  nginx-module-brotli-dbg      1.25.2+1.0.0-1~bookworm        amd64        debug symbols for the nginx-module-brotli
ii  nginx-module-geoip           1.25.2-1~bookworm              amd64        nginx GeoIP dynamic modules
ii  nginx-module-image-filter    1.25.2-1~bookworm              amd64        nginx image filter dynamic module
ii  nginx-module-lua             1.25.2+0.10.25-1~bookworm      amd64        nginx 3rd-party Lua dynamic modules
ii  nginx-module-lua-dbg         1.25.2+0.10.25-1~bookworm      amd64        debug symbols for the nginx-module-lua
ii  nginx-module-ndk             1.25.2+0.3.2-1~bookworm        amd64        nginx 3rd-party NDK dynamic module
ii  nginx-module-ndk-dbg         1.25.2+0.3.2-1~bookworm        amd64        debug symbols for the nginx-module-ndk
ii  nginx-module-njs             1.25.2+0.8.1-1~bookworm        amd64        nginx njs dynamic modules
ii  nginx-module-njs-dbg         1.25.2+0.8.1-1~bookworm        amd64        debug symbols for the nginx-module-njs
ii  nginx-module-xslt            1.25.2-1~bookworm              amd64        nginx xslt dynamic module
ii  nginx-module-xslt-dbg        1.25.2-1~bookworm              amd64        debug symbols for the nginx-module-xslt
thresheek commented 11 months ago

Thanks, it's actually ok to show the "No such file or directory" error since we don't provide source codes for gdb to look at.

How lua-heavy is your configuration? Would you be able to turn lua module (and subsequently lua code in your nginx configuration) off and re-test?

EugenMayer commented 11 months ago

Lua is providing auth-request support, means, i should be able to run the curl against login, which already triggers the issue.

I'am happy to share what we have here anyway, i think it might be interesting how why we re-implemented the auth-request using LUA.

Details ```nginx ####### GENERAL WAT ################ # The reason we are not just using auth_request is one single limitation, which is the ability to forward multiple # cookies after retrieving the auth_request. Even with nginx 1.25 one can only pick the first cookie and forward this # one to the response, but not all. This is particullary problematic when a remember-me flow is triggered. In this case, # the /auth_request endpoint is access with remember-me=A and if the token is correct, the auth_request will have the # x-user/x-role header AND the SESSION and the new remember-me=B cookie. The latter 2 are our final issue, since we need # them both on the client, otherwise the response will only include remember-me=B, which will cause the browser to trigger # several XHR requests using this remember-me token, (since SESSION is not present), while only the first one will work, # all others will use the outdated/no longer value remember-me=B token and actually cause a cookie-theft auto-logout. # TLR: we need the session cookie and the remember-me cookie to be added to the response, and this is only possible # using LUA. #################################### # simulates an auth_request, but after it`s return, gets all cookie headers and saves them in a hashmap # ngx.ctx.auth_cookies - this will not aumagically become a proxy pass header - but thus is what we use the # header_filter_by_lua_block block for. # PHASE: access tail # see https://github.com/openresty/lua-nginx-module#access_by_lua_block access_by_lua_block { -- ensure we never forward any X-User / Role header to the authentication gateway ngx.req.clear_header("X-User") ngx.req.clear_header("X-Roles") ngx.req.set_header("X-Forwarded", "true") ngx.req.set_header("X-Forwarded-URL", ngx.var.request_uri) -- make a auth-request alike request to our endpoint, retrieving the X-User and X-Roles header -- and if needed the SESSION and remember me cookies local authReqResponse = ngx.location.capture(" /service/authgw/auth_request", {body = ""}) if authReqResponse.status == ngx.HTTP_NO_CONTENT then -- save it so we can use it in the outbound / response phase to add the cookies -- we do not intend to add the cookies to the upstream request ngx.ctx.auth_cookies = authReqResponse.header["Set-Cookie"] -- set the headers for our proxy_pass derective. In lua, this also sets the response headers though ngx.req.set_header("X-User", authReqResponse.header["X-User"]) ngx.req.set_header("X-Roles", authReqResponse.header["X-Roles"]) -- Do not forward the authorization header to our microservices upstreams - they should not need or have it ngx.req.clear_header("Authorization") return end if authReqResponse.status == ngx.HTTP_FORBIDDEN then ngx.exit(res.status) end ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR) } # this is always run after the acces phase, no matter the position. This means this is how we ensure that # ngx.ctx.auth_cookies is populated from the access_by_lua_block above. With this block we use the arbitrary hashmap # created above and feed it as headers for our "continuing" pass proxy. The strategy we use here is merge, so we ensure # we respect the cookies that have already been available in the initial request. # The merge strategy is: # - if either the request or the auth_request cookies are empty, use the one that is not empty (or return emppty if both) # - if neither is empty, let c be the request cookies and try to merge into c. Before that, ensure that c is a table, even # if the request had only one cookie (string) # - if there is only 1 cookie from auth_request (string/single value), just add the cookie at the end of the incoming cookies # - if there are multiple values from auth_request, add it to the resulting cookie table one by one # PHASE: output-header-filter header_filter_by_lua_block { local function merge_cookies(request, auth_request) -- pick request for result, if not empty, otherwise use auth_request local result = request or auth_request -- if either "request" or "auth_request" is empty, "result" already has the required result if (request and auth_request) == nil then return result end -- neither "request" nor "auth_request" are empty, result will be a table, "result" now equals to "request" -- if "result" is a string, lets made it a table instead if type(result) == "string" then result = {result} end -- append "auth_request" to "result" if type(auth_request) == "string" then table.insert(result, auth_request) else for _, v in ipairs(auth_request) do table.insert(result, v) end end return result end -- this sets the response header only! it will not be visible on the upstream proxy ngx.header["Set-Cookie"] = merge_cookies(ngx.header["Set-Cookie"], ngx.ctx.auth_cookies) ngx.header["X-User"] = nil ngx.header["X-Roles"] = nil } ```

This is all we have in LUA.

EugenMayer commented 11 months ago

Turing LUA off does indeed fixes the crash. This code LUA code was working in 1.25.1 and before. Do you think it is related to an LUA upgrade or 1.25.2?

EugenMayer commented 11 months ago

Removing

Details ```nginx # this is always run after the acces phase, no matter the position. This means this is how we ensure that # ngx.ctx.auth_cookies is populated from the access_by_lua_block above. With this block we use the arbitrary hashmap # created above and feed it as headers for our "continuing" pass proxy. The strategy we use here is merge, so we ensure # we respect the cookies that have already been available in the initial request. # The merge strategy is: # - if either the request or the auth_request cookies are empty, use the one that is not empty (or return emppty if both) # - if neither is empty, let c be the request cookies and try to merge into c. Before that, ensure that c is a table, even # if the request had only one cookie (string) # - if there is only 1 cookie from auth_request (string/single value), just add the cookie at the end of the incoming cookies # - if there are multiple values from auth_request, add it to the resulting cookie table one by one # PHASE: output-header-filter header_filter_by_lua_block { local function merge_cookies(request, auth_request) -- pick request for result, if not empty, otherwise use auth_request local result = request or auth_request -- if either "request" or "auth_request" is empty, "result" already has the required result if (request and auth_request) == nil then return result end -- neither "request" nor "auth_request" are empty, result will be a table, "result" now equals to "request" -- if "result" is a string, lets made it a table instead if type(result) == "string" then result = {result} end -- append "auth_request" to "result" if type(auth_request) == "string" then table.insert(result, auth_request) else for _, v in ipairs(auth_request) do table.insert(result, v) end end return result end -- this sets the response header only! it will not be visible on the upstream proxy ngx.header["Set-Cookie"] = merge_cookies(ngx.header["Set-Cookie"], ngx.ctx.auth_cookies) ngx.header["X-User"] = nil ngx.header["X-Roles"] = nil } ```

Does still crash the worker. So the issue is with

Details ```nginx access_by_lua_block { -- ensure we never forward any X-User / Role header to the authentication gateway ngx.req.clear_header("X-User") ngx.req.clear_header("X-Roles") ngx.req.set_header("X-Forwarded", "true") ngx.req.set_header("X-Forwarded-URL", ngx.var.request_uri) -- make a auth-request alike request to our endpoint, retrieving the X-User and X-Roles header -- and if needed the SESSION and remember me cookies local authReqResponse = ngx.location.capture(" /service/authgw/auth_request", {body = ""}) if authReqResponse.status == ngx.HTTP_NO_CONTENT then -- save it so we can use it in the outbound / response phase to add the cookies -- we do not intend to add the cookies to the upstream request ngx.ctx.auth_cookies = authReqResponse.header["Set-Cookie"] -- set the headers for our proxy_pass derective. In lua, this also sets the response headers though ngx.req.set_header("X-User", authReqResponse.header["X-User"]) ngx.req.set_header("X-Roles", authReqResponse.header["X-Roles"]) -- Do not forward the authorization header to our microservices upstreams - they should not need or have it ngx.req.clear_header("Authorization") return end if authReqResponse.status == ngx.HTTP_FORBIDDEN then ngx.exit(res.status) end ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR) } ```
EugenMayer commented 11 months ago

Removing anything else did not fix the issue, it comes down to the auth request itself:

    local authReqResponse = ngx.location.capture(" /service/authgw/auth_request", {body = ""})

Removing this will stop the worker from crashing. Cannot see anything special to this line - and fairly mandatory for us :)

thresheek commented 11 months ago

There was a patch we wrote and shipped with Lua package prior to nginx 1.25.2 / lua 0.10.25, which was apparently merged to Lua tree but surprisingly not as a part of 0.10.25 release: https://github.com/openresty/lua-nginx-module/commit/16de4bea620b6281f948b0d65341e31b2c90d597 - at least I cant see it in https://github.com/openresty/lua-nginx-module/commits/v0.10.25. This is admittedly an error on our part for not checking if it was.

The plan is to re-add it again to the lua package some time this week.

Unfortunately it will not be available for 1.25.2 release as is - as we check out explicit tag during the builds - so you'd need to edit the following like in your Dockerfile once we put the patch back:

&& hg clone -r ${NGINX_VERSION}-${PKG_RELEASE%%~*} https://hg.nginx.org/pkg-oss/ \ to && hg clone https://hg.nginx.org/pkg-oss/ \

This will use the "current" sources of packaging, instead of 1.25.2-1 tag.

Also, we plan to release nginx 1.25.3 next Tuesday so it might be worth waiting for it, too.

EugenMayer commented 11 months ago

Thanks for looking into that!

I can just wait for the 1.25.3 release next Tuesday. If i understand you correct, with 1.25.3 i will not need to change anything on the build side, just rebuild it and that's it right?

thresheek commented 11 months ago

Correct. Well, provided I'm right about the patch and if it will fix the issue for you (but if it works with 1.25.1 this should be true).

EugenMayer commented 11 months ago

If you are in mood and want to add something to the docs on how to get core dumps up and running, those are the complete docs - happy to share those.


Setting up for core-dump stacktrace support

On you host running docker

# enable coredumps
echo "/tmp/cores/core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern
sudo sysctl -w fs.suid_dumpable=2
ulimit -c unlimited

On the container

vi /etc/nginx/nginx.conf

# add this at the very top
worker_rlimit_core  1000M;
working_directory /tmp/cores;

# install headers and gdb
export NGINX_GPGKEY_PATH=/usr/share/keyrings/nginx-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/nginx-archive-keyring.gpg] https://nginx.org/packages/mainline/debian/ bookworm nginx" >> /etc/apt/sources.list.d/nginx.list
apt update && apt install -y nginx-dbg nginx-module-njs-dbg nginx-module-njs-dbg libc6-dbg gdb

mkdir -p /tmp/cores
chown root:root /tmp/cores
chmod 1777 /tmp/cores

Now on the container, start nginx-debug

nginx-debug

Reproduce your crash and then open the core-dump with (replace the XXXX)

gdb --core=/tmp/cores/core.nginx-debug.XXXX /usr/sbin/nginx-debug

Specials / Optional for custom modules

If you are using addition/custom modules, you need to adjust the installation of those to include the symbols too, so if those symbols show up in your stackrace, they can be resolved too. For this add this below this line https://github.com/nginxinc/docker-nginx/blob/master/modules/Dockerfile#L78

apt install --no-install-suggests --no-install-recommends -y /tmp/packages/nginx-module-${module}-dbg_${NGINX_VERSION}*.deb; \
EugenMayer commented 11 months ago

@thresheek we tried 1.25.3 today and the segfauls / crashes are still present - so in fact this issue is yet not closed.

Still, using 1.25.1 does work rock-solid for us

EugenMayer commented 11 months ago

@thresheek if possible and you find time, i would love to work on this issue and nail it down. is there anything else i could contribute?

thresheek commented 11 months ago

Hi @EugenMayer unfortunately we did not get to the root cause of the bug in our testing prior to the release, so we shipped the same version.

I can try and build a package with a proposed fix for you to try - let me know what OS/architecture you're on. If that works, I'll push a patch to pkg-oss tree.

EugenMayer commented 11 months ago

@thresheek thank you!

we use the offical docker image, thus ( linux debian bookworm / AMD64 ) (mainline-bookworm)

thresheek commented 11 months ago

Can you try the following packages please? Should be installable on top of nginx:mainline.

https://pp.nginx.com/thresh/gh-docker-nginx-issue845/nginx-module-ndk_1.25.3+0.3.2-1~bookworm_amd64.deb https://pp.nginx.com/thresh/gh-docker-nginx-issue845/nginx-module-lua_1.25.3+0.10.25-1~bookworm_amd64.deb

They're built using nginx:mainline and the following patch on top of pkg-oss: https://pp.nginx.com/thresh/gh-docker-nginx-issue845/1-lua__bring_back_patch_removed_in_846_2467ec4029e_.patch

EugenMayer commented 11 months ago

@thresheek so basically i normal 1.25.3 baseline with the lua/ndk modules replaced by the ones you gave me?

EugenMayer commented 11 months ago

So what i have now is

nginx -version
nginx version: nginx/1.25.3

# installed your .deb packages for ndk / lua

dpkg --list | grep nginx
ii  nginx                       1.25.3-1~bookworm              amd64        high performance web server
ii  nginx-module-brotli         1.25.3+1.0.0-1~bookworm        amd64        nginx 3rd-party brotli compression dynamic modules
ii  nginx-module-brotli-dbg     1.25.3+1.0.0-1~bookworm        amd64        debug symbols for the nginx-module-brotli
ii  nginx-module-geoip          1.25.3-1~bookworm              amd64        nginx GeoIP dynamic modules
ii  nginx-module-image-filter   1.25.3-1~bookworm              amd64        nginx image filter dynamic module
ii  nginx-module-lua            1.25.3+0.10.25-1~bookworm      amd64        nginx 3rd-party Lua dynamic modules
ii  nginx-module-ndk            1.25.3+0.3.2-1~bookworm        amd64        nginx 3rd-party NDK dynamic module
ii  nginx-module-njs            1.25.3+0.8.2-1~bookworm        amd64        nginx njs dynamic modules
ii  nginx-module-xslt           1.25.3-1~bookworm              amd64        nginx xslt dynamic module

and it works! Using the original lua/ndk modules (not your patched ones), i get empty responses with the segfault all over the place.

So indeed, your build fixes it!

thresheek commented 11 months ago

Thanks for the tests!

EugenMayer commented 11 months ago

Thank you for the effort high-five :)

thresheek commented 11 months ago

The patch is now live on https://hg.nginx.org/pkg-oss/rev/a1a3d24b4fb5 - this will be included in the next release (1.25.4).

EugenMayer commented 11 months ago

Awesome, thanks! Will test it as soon as it will be released. What is the ETA?

thresheek commented 11 months ago

There is no ETA as of now...

EugenMayer commented 10 months ago

Any hints on when a release would happen. Is there a chance in 2023 or are you seeing this in 2024/Q1?

thresheek commented 10 months ago

Hey @EugenMayer - thanks for checking in. The current ETA is mid-January I think, but it can slip a couple weeks further too.

EugenMayer commented 10 months ago

Thanks for the heads up, i wait till end of Jan. 2024 and bump it again for a refresher :)

EugenMayer commented 8 months ago

@thresheek well here i'am :) Any updates on the expected timing with the next release? Thanks for your time

EugenMayer commented 8 months ago

@thresheek sorry to be that guy, but if you find any time to give me an idea on how the release schedule looks right now, it would be awesome! Thanks

thresheek commented 8 months ago

Hi @EugenMayer sorry for being silent on this one - we just had a 1.25.4 release out with some security fixes which is why I couldnt be more public about the release date.

Please report if you still have the issue. Thanks!

EugenMayer commented 8 months ago

No worries, I understand. Will test ASAP and report back. Thank you for asking!

EugenMayer commented 7 months ago

Tested and it is fixed! Thank you very much for going all the way!

IMHO this issue can be closed

thresheek commented 7 months ago

That's great to hear. Thanks for testing and reporting!