openresty / replace-filter-nginx-module

Streaming regular expression replacement in response bodies
260 stars 68 forks source link

replace_filter removes some Caching headers. #8

Open bhargavtrivedi opened 11 years ago

bhargavtrivedi commented 11 years ago

Hello,

I am using Nginx as web server and I tried to implement replace_filter module for removing some part of response. This implementation works fine for me but it has caused one issue. It also removes some important response headers.

I tried to look into the code of replace filter and found that it removes "last modified" headers because of which the response can not be cached.

Would you explain why is it important to remove "last modified" response headers? And would you suggest some solution to enable it again after processing replace_filter code?

Thanks, Bhargav

agentzh commented 11 years ago

@bhargavtrivedi The ngx_replace_filter module clears the "Last-Modified" response header in its header filter because its body filter may change the response (body), so the original "Last-Modifiled" response header may become logically invalid.

And it's also wrong for the ngx_replace_filter to insert a new "Last-Modified" response header by using the current timestamp because ngx_replace_filter's body filter may not change the response body but it cannot be known in advance in the header filter because the body filter has not run yet.

Do you want to retain the original Last-Modified response header with force or overwrite it with the current timestamp?

bhargavtrivedi commented 11 years ago

Hi,

Thanks for your explanation.

I would like to know both options, 1) How can we retain the original Last-Modified response header with force? 2) How can we overwrite it with current timestamp ?

If the original file is not modified, I just want to send 304 response code to the client.

Thanks Bhargav

bhargavtrivedi commented 11 years ago

Hello agentzh,

Is it possible to retain Last-Modified response header?

If it is possible, can you suggest me the solutions for both of the options I mentioned in previous comment?

Thanks, Bhargav

agentzh commented 11 years ago

@bhargavtrivedi For the 1st option, I just implemented the replace_filter_last_modified directive in this module. See

https://github.com/agentzh/replace-filter-nginx-module/#replace_filter_last_modified

for the documentation.

For the 2nd option, you can use ngx_lua module's header_filter_by_lua directive to create a new Last-Modifiled response header from the current time:

header_filter_by_lua '
     ngx.header["Last-Modified"] = ngx.http_time(ngx.time())
';

See ngx_lua's documentation for more details: https://github.com/chaoslawful/lua-nginx-module#readme

You should ensure that ngx_lua's header filter runs after ngx_replace_filter's. To do that, you need to add these two modules in the reversed order while building Nginx, that is,

./configure --add-module=/path/to/lua-nginx-module \
            --add-module=/path/to/replace-filter-nginx-module
bhargavtrivedi commented 11 years ago

Hi,

Thanks for your quick updates on this.

For first option, I will have to build rpm with new version of this module so I will build and check it.

I am using lua to gunzip the backend response so I can not run it after replace_filter. I am not sure if there is any other better option to gunzip backend response.

One more thing I would like to know, which of the below two options is better to use? 1) Buffer the backend response and modify it with body_filter_by_lua* 2) use replace_filter to modify backend response.

Which one of the above two is faster and less CPU expensive?

There is also HttpSubsModule in Nginx, is there any limitation of that compare to replace_filter module ? And which one is faster if you may have some benchmarking after development of replace filter?

I really appreciate your help on this new feature to retain original "Last-Modified" header.

Thanks, Bhargav

bhargavtrivedi commented 11 years ago

Hello agentzh,

I build new rpm and tested new version of replace_filter. Now it can preserve "Last-Modified" headers.

Thanks for your help on this.

If you get bit, would you check my previous post? I have few queries and I would like to have your suggestions for those.

Thanks, Bhargav

agentzh commented 10 years ago

Hello!

On Tue, Nov 19, 2013 at 11:09 PM, bhargavtrivedi wrote:

One more thing I would like to know, which of the below two option is better to use? 1) Buffer the backend response and modify it with body_filter_by_lua* 2) use replace_filter to modify backend response.

It'll be much hard to do text replacement in body_filter_by_lua because it accepts the response body stream in data chunks and the data chunks can be splitted anywhere in the original data stream. ngx_replace_filter handles various corner cases around the data chunk boundaries automatically.

There is also HttpSubsModule in Nginx, is there any limitation of that compare to replace_filter module ? And which one is faster if you may have some benchmarking after development of replace filter?

The ngx_subs module uses a backtracking regex engine that can only work on data lines. So if your pattern would span multiple lines, then it won't work as expected. Also, ngx_subs would buffer more data than necessary (because it works on lines) due to the nature of backtracking regex matching algorithms.

But you should also keep in mind that the sregex engine used by ngx_replace_filter is still very young and I owe it a lot of important optimizations so the CPU efficiency of sregex is not so great at the moment. But it will get better and better over time :)

Regards, -agentzh

bhargavtrivedi commented 10 years ago

Hello agentzh,

Thanks for the explanation.

If I use multiple replace_filter regex rules to modify response body, do you think it will increase response time of the request(delay response)?

Regards, Bhargav

agentzh commented 10 years ago

Hello!

On Sat, Nov 30, 2013 at 12:24 AM, bhargavtrivedi wrote:

If I use multiple replace_filter regex rules to modify response body, do you think it will increase response time?

When you add more regex rules, you add more complexity to your pattern (because all the regexes will be merged into a single one and match against the stream at once). So it will certainly add more evaluation time. Whether the additional time matters depends on a lot of factors and you have to benchmark your use case yourself.

-agentzh