openresty / replace-filter-nginx-module

Streaming regular expression replacement in response bodies
260 stars 67 forks source link

data lost in multi-echo #3

Closed LazyZhu closed 11 years ago

LazyZhu commented 11 years ago

test conf:

    location / {
        charset utf-8;
        default_type text/html;
        echo "ABCabcABC";
        echo "ABCabcABC";
        #replace_filter_types text/plain;
        replace_filter "(a.+?c){2}" "X" "ig";
    }

This output XXABC\n instead of XABC\nXABC\n, the replace filter lost the nonmatched data from first echo.

agentzh commented 11 years ago

Hello!

On Thu, Jan 3, 2013 at 3:21 AM, LazyZhu notifications@github.com wrote:

echo "ABCabcABC"; echo "ABCabcABC";

replace_filter_types text/plain;

replace_filter "(a.+?c){2}" "X" "ig"; }

This output XXABC instead of XABC\nXABC, the replace filter lost the nonmatched data from first echo.

Well, this is not a bug; it is the expected behaviour.

perl 5.16.2, for example, demonstrates exactly the same behaviour:

$ perl -e '$_="ABCabcABC\nABCabcABC\n";s/(a.+?c){2}/X/igsm;print'
XXABC

We can see how the match works by running the following command:

$ perl -e '$_="ABCabcABC\nABCabcABC\n";s/(a.+?c){2}/[$&]/igsm;print'
[ABCabc][ABC
ABCabc]ABC

that is, the second match is "ABC\nABCabc", which takes priority over "ABCabc" on the second input line because the "left-most wins" rule applies here. So everything works as expected here :)

We'll eventually be able to do the same thing in ngx_replace_filter once we have support for the $& capturing variables in the replacement text :)

Thanks for trying out this module! I'm really looking forward to more bug reports from you ;)

Best regards, -agentzh

LazyZhu commented 11 years ago

Thanks for the details. So the . can match any char include newline(dotall mode or multiline mode)?

agentzh commented 11 years ago

Hello!

On Thu, Jan 3, 2013 at 7:50 PM, LazyZhu notifications@github.com wrote:

Thanks for the details. So the . can match any char include newline?

Yes. Perl 5's /sm regex flags are assumed here. See the documentation for details:

https://github.com/agentzh/sregex#syntax-supported

So if you want to match any char but a newline, use [^\n] instead of a dot (.) :)

Best regards, -agentzh

LazyZhu commented 11 years ago

Thanks for the tips. :)