sjtug / caddy2-filter

Replace response body in caddy2
Apache License 2.0
46 stars 9 forks source link

Bug: Filter doesn't work on complicated HTTP responses? #1

Closed supercom32 closed 4 years ago

supercom32 commented 4 years ago

Hi,

So I was testing this plugin out with a simple Caddyfile example:

www.MyRandomDomain.net {
    reverse_proxy {
        to https://www.lipsum.com
        header_up Host "www.lipsum.com"
    }
    filter {
        content_type .*
        search_pattern "Lorem Ipsum"
        replacement "New Name"
    }
}

In this, I expected all instances of "Lorem Ipsum" on the page to be turned into "New Name". However, nothing seems to happen. Is this a bug, or is there some usage of this plugin that is incorrect?

francislavoie commented 4 years ago

So I'm not certain cause you didn't really give enough info to know for sure (logs, what system, how you ran Caddy, etc) but I think it probably has to do with the directive ordering stuff I just wrote about in this issue: https://github.com/sjtug/caddy2-filter/issues/2

I'm guessing you tried to caddy reload after adding the filter plugin but it failed? In which case Caddy would've kept running with the old config and you would've seen no difference in the output. I think you'll need to set the order global option for Caddy not to complain about the filter directive not being ordered.

supercom32 commented 4 years ago

Hi,

Thanks so much for the kind reply. In regards to your questions, indeed I was fully restarting Caddy each time so my changes were being reflected accordingly.

In regards to directive ordering, I tried a bunch of them at the time. I tried ordering before response, I tried doing it at the start, etc. It seems nothing I did would really generate the kind of results I was expecting from the filter. I'm thinking I might try your example and see if doing it after the encode step works better. Maybe in your environment your not experiencing the same kind of issues I'm having when hitting the same URL?

I was poking around the unit tests hoping to gleam if I was missing anything, but it doesn't seem like it. If I setup something simple like the unit test does, it works just fine. However, whenever I try to hit a proper website to do a replace on some web element, It doesn't work.

francislavoie commented 4 years ago

I haven't actually tried this plugin yet, I'm just coming at this from a relatively deep understanding of the Caddy internals after having written plenty of PRs for the project in the last few months. You might be right that the plugin is just broken, but we'll need to wait until @htfy96 clarifies for us :smile:

supercom32 commented 4 years ago

Ah, ok. Thanks for the heads up! I was worried that maybe this plugin only works against requests being served locally and not via proxy. It would be a shame if so, as this seems like a pretty powerful feature.

htfy96 commented 4 years ago

Yes. @francislavoie is correct. The problem here is the order of plugins. The recommended way as per @francislavoie is to add this to your Caddyfile:

{
    order filter after encode
}

The following Caddyfile works:

$  cat /tmp/Caddyfile.test 
{
    order filter first
}
http://localhost:2015 {
    reverse_proxy {
        to https://www.lipsum.com
        header_up Host "www.lipsum.com"
    }
    filter {
        content_type .*
        search_pattern "Lorem Ipsum"
        replacement "New Name"
    }
}
$ xcaddy run -config /tmp/Caddyfile.test
# On another terminal:
$ curl http://localhost:2015/ | grep 'New Name'
100 1755<title>New Name - All the facts - Lipsum generator</title>
8     <meta name="keywords" content="New Name, Lipsum, Lorem, Ipsum, Text, Generate, Generator, Facts, Information, What, Why, Where, Dummy Text, Typesetting, Printing, de Finibus, Bonorum et Malorum, de Finibus Bonorum et Malorum, Extremes of Good and Evil, Cicero, Latin, Garbled, Scrambled, Lorem ipsum dolor sit amet, dolor, sit amet, consectetur, adipiscing, elit, sed, eiusmod, tempor, incididunt" />
      <meta name="description" content="Reference site about New Name, giving information on its origins, as well as a random Lipsum generator." />
 0 1<h1>New Name</h1>
[clipped]
htfy96 commented 4 years ago

I have updated the doc to include the order boilerplate plus a test case for reverse_proxy + filter.

Closing this issue now. Feel free to reopen it if you have any question!

supercom32 commented 4 years ago

Thanks for the kind reply. I am able to reproduce what your seeing, in that if I curl the url, I get returned a source that indicates my changes have taken place.

However, if I direct my web browser to the same URL, it appears that all my changes are gone. Would you happen to know why curl returns the right changes, while navigating to the URL in a browser reverts the page back to it's original form? I tried copying the user agent and other header information in my curl request, but in each time it performs fine as expected. It's just under the browser where I can't see any changes?

htfy96 commented 4 years ago

I can reproduce this issue locally. The root cause still seems to be directive orders. This plugin doesn't work when curl --compressed is used. Investigating

supercom32 commented 4 years ago

Thanks for looking into this. I think this is why initially I thought the plugin wasn't working. Because I never bothered to check curl and just assumed since my browser wasn't returning anything, it was somehow broken.

I look forward to hearing about what you found!

htfy96 commented 4 years ago

After some efforts I believe this is a consequence of the design of Caddy's reverse_proxy module. In short, when we request a compressed response with Accept-Encoding: gzip header, reverse_proxy module will returns a gzipped response body to this module. This module then looks for Lorem Ipsum in the gzipped data and found nothing. The client finally receives the same gzipped data from upstream.

Technical details:

reverse proxy internally uses HTTPTransport in Go. However, it would only transparently decompress the response when the client (Caddy) doesn't add Accept-Encoding: gzip explicitly. So, when curl requests an uncompressed version to Caddy, the HTTPTransport in Caddy transistently decodes the response from upstream, and pass the uncompressed version to this plugin. When we add an Accept-Encoding: gzip header to curl's request, HTTPTransport no longer automatically decompress the result.

I think it would be best for Caddy to add an option like always_decompress_gzip to reverse_proxy.transport. Let's loop in @francislavoie to discuss this.

francislavoie commented 4 years ago

I think you can just set something like header_up -Accept-Encoding to remove the header when sending the request upstream. Does that work? If so I think it's reasonable to add a note in the README about that.

Another possible enhancement, this module could look at the Content-Encoding header to determine what to do. Maybe you could decode, modify, then re-encode?

htfy96 commented 4 years ago

header_up -Accept-Encoding does work and I'll add it to the README. However personally I don't like to plug workaround into this module, especially when there's already encode module. A more elegant way would be adding a decode module right before respond.

supercom32 commented 4 years ago

I would agree with what has been said. It would be better to find a proper solution to for this edge-case rather than adding specific workarounds to this module. That way, everyone benifits from getting the change and workerounds don't need to duplicated in projects that need it.