nginxinc / nginx-go-crossplane

A library for working with NGINX configs in Go
Apache License 2.0
52 stars 14 forks source link

Bug: header_filter_by_lua_block parser doesn't detect directive end #32

Open asharpe opened 1 year ago

asharpe commented 1 year ago

Is your feature request related to a problem? Please describe

This is mostly a feature request because it's related to openresty which adds additional directives to the nginx configuration, and I'm not sure that's a core concern for this organisation.

When using *_by_lua_bock directives the parser seems to get confused and adds configuration outside the block into the lua block.

Example config

    header_filter_by_lua_block {
      cors_allow_for([[some\.domain\.name$]])
    }
    include includes/common;

This generates JSON which looks like

...
                {
                  "directive": "header_filter_by_lua_block",
                  "line": 32,
                  "args": [],
                  "block": [
                    {
                      "directive": "cors_allow_for([[some\\.domain\\.name$]])",
                      "line": 33,
                      "args": []
                    },
                    {
                      "directive": "error_page",
                      "line": 2,
                      "args": [
                        "404",
                        "=",
                        "@error"
                      ]
                    },
                    {
...

where the error_page directive and subsequent directives come from the included file.

Describe the solution you'd like

To get a valid representation of the configuration. The approach suggested here is to identify specific directives to exclude them from further parsing. This means the offending directive will still appear in the output, but its block will be empty. If these directives could be identified by command line options then the solution would work for as-yet unforeseen directives.

Describe alternatives you've considered

Fixing the parser to correctly process the *_by_lua_block directives (probably by turning them into strings). I don't think this is a great option because any nginx addon could add a directive which could fall afoul of the same issue.

Additional context

Add any other context or screenshots about the feature request here.

asharpe commented 1 year ago

It looks like the python version handles this already - https://github.com/nginxinc/crossplane/blob/master/crossplane/ext/lua.py

asharpe commented 1 year ago

Apologies, this feature already exists - https://github.com/nginxinc/nginx-go-crossplane/blob/main/parse.go#L52

asharpe commented 1 year ago

On reflection, this is a bug in the parser which is not solved by ignoring the directive. When the directive is ignored, the parser will still eat the config and then exclude those other directives/blocks from the payload!

After looking in the code the bug is worked around by ignoring the directive, and the other blocks are NOT excluded from the payload. The issue ATM appears to be the parser trying to handle the *_by_lua_block contents as nginx directives while they're not.

One path forward would be to detect these directives and consume their blocks, but instead of discarding them, turn them into strings. This could probably be done within the existing parser, but I've only been looking at it briefly and it's hurting my head.

asharpe commented 1 year ago

So I haven't actually addressed the bug here, it still stands. A config file which illustrates the issue I have is

http {
   server {
      header_filter_by_lua_block {
         cors_allow_for([[some\.domain\.name$]])
      }
      error_page 404 /handler;
   }
}

Fixing the bug would only partially solve my problem, which is that I want to keep the lua code, but not have it parsed into the Directive structure.

I've implemented another option for directives that should be handled similarly to the ignored directives. The difference is that we don't discard their content, instead we collect it and put it in the comment field for the directive in question.

This appears to resolve the immediate issue I'm having, but it's not MR worthy - https://github.com/nginxinc/nginx-go-crossplane/compare/main...asharpe:nginx-go-crossplane:skip-block-contents?expand=1

Edit: Collecting the content is easy enough, but we lose all the structure which makes it impossible to reconstruct. I've tested on some moderately difficult lua and it got hard pretty quick (eg. single line comments). This is my first dip into lex/parse type stuff and I'm not sure it will be possible to preserve the lua without having the lexer understand it as well

asharpe commented 1 year ago

My current thinking is that *_by_lua are already handled as strings, *_by_lua_file are just directives with args, it's just *_by_lua_block that's causing an issue.

This change moves the logic into the lexer and allows it to identify *_by_lua_block directives and capture the contents verbatim, thus turning them into an arg for the relevant directive. The output of this suits me perfectly, it would be great to get some feedback from the maintainers on the approach here.

https://github.com/nginxinc/nginx-go-crossplane/compare/main...asharpe:nginx-go-crossplane:by-lua-block?expand=1

Edit: There are no tests on this yet, and I'm aware this doesn't yet handle set_by_lua_block. Edit: when writing tests, take https://github.com/nginxinc/crossplane/issues/85#issuecomment-657434966 into account

rikatz commented 2 months ago

The problem apparently is because the parser cannot detect end of lines, as we don't add the ";" after each directive.

Example that works:

        content_by_lua_block {
            tcp_udp_configuration.call();
        }

if you remove the ";" (which is right on by_lua blocks btw) it breaks

rikatz commented 2 months ago

I am not sure if this is the same issue, but I have figured out that Lua parsers are not used by default, so I created some example here: https://gist.github.com/rikatz/2928938c6d354366e7b30e8cffc69bf1