tempesta-tech / tempesta-test

Test suite for Tempesta FW
11 stars 4 forks source link

Test precedence of processing `Content-Encoding` and `Transfer-Encoding` by various clients #298

Closed const-t closed 1 year ago

const-t commented 2 years ago

Need to develop a test that will show how curl and nginx in proxy mode process Content-Encoding and Transfer-Encoding in same request. For this purposes we can use any backend(e.g nginx, apache or custom) which can send responses in required way. All encodings must be applied to response in order specified by test case. The test must include following cases:

  1. Response contains Transfer-Encoding: chunked, gzip. Encoding order: chunked -> gzip.
  2. Response contains Transfer-Encoding: gzip, chunked, br. Encoding order: gzip -> chunked -> br.
  3. Response contains Transfer-Encoding: gzip, chunked and Content-Encoding: br. Encoding order: gzip -> chunked -> br.
  4. Response contains Transfer-Encoding: gzip, chunked and Content-Encoding: br. Encoding order: br -> gzip -> chunked.
  5. Response contains Transfer-Encoding: gzip, chunked, deflate and Content-Encoding: br. Encoding order: br -> gzip -> chunked -> deflate. Encoding order implies order in which our backend must apply encodings to response. Any cases where chunked not final encoding might be tested with and without closing connection.
krizhanovsky commented 2 years ago

The issue blocks https://github.com/tempesta-tech/tempesta/pull/1418, so it's crucial

nickzaev commented 2 years ago

This was quite a research, and here's what I got. First of all, as a side note, during the tests I was using Nginx and apart from whatever, chunked there's effectively no way to configure it to use several encodings at once, also I haven't managed to make it place chunked infront of any encoding. Also deflate is not supported by Nginx at all.

As a baseline I had this as a backend:

from fastapi import FastAPI, Response
from fastapi.middleware.gzip import GZipMiddleware

app_gzipped = FastAPI()

app_gzipped.add_middleware(GZipMiddleware)

@app_gzipped.get("/gzipped")
async def gzipped(response: Response):
    response.headers["Via"] = "Via: HTTP/1.1 FastAPI"
    response.headers["Transfer-Encoding"] = "chunked"
    return open("index.html").read()

Note that: 1) h11, a basic building block of fastapi doesn't support transfer encodings other than "chunked". 2) according to the docs I added Via header so Nginx could know the request is proxied (even though it already uses proxy_pass directive in the config). And base Nginx config was:

worker_processes auto;
events {
    worker_connections 1024;
    use epoll;
}
http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    open_file_cache max=1000;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    open_file_cache_errors off;    
    access_log off;
    server {
        gzip on;
        gzip_types      text/plain application/xml application/json;
        gzip_proxied    no-cache no-store private expired auth;
        listen 127.0.0.1:8080;
        location /gzipped {
            proxy_pass http://127.0.0.1:8000/gzipped;
        }
        location /nginx_status {
            stub_status on;
        }
    }
}

Then I did 2 request, the first one to the backend server straight and the second one through Nginx:

tempesta-nick encoding (nz-te-ce-precedence?) # curl http://127.0.0.1:8000/gzipped -H "Accept-Encoding: gzip, chunked" -v > 1
*   Trying 127.0.0.1:8000...
* TCP_NODELAY set
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> GET /gzipped HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/7.68.0
> Accept: */*
> Accept-Encoding: gzip, chunked
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Mon, 19 Sep 2022 12:02:51 GMT
< server: uvicorn
< content-type: application/json
< via: Via: HTTP/1.1 FastAPI
< content-encoding: gzip
< vary: Accept-Encoding
< Transfer-Encoding: chunked
<
{ [8728 bytes data]
100  8715    0  8715    0     0  2127k      0 --:--:-- --:--:-- --:--:-- 2127k
* Connection #0 to host 127.0.0.1 left intact
tempesta-nick encoding (nz-te-ce-precedence?) # curl http://127.0.0.1:8080/gzipped -H "Accept-Encoding: gzip, chunked" -v > 2
*   Trying 127.0.0.1:8080...
* TCP_NODELAY set
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /gzipped HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.68.0
> Accept: */*
> Accept-Encoding: gzip, chunked
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.18.0 (Ubuntu)
< Date: Mon, 19 Sep 2022 12:02:58 GMT
< Content-Type: application/json
< Transfer-Encoding: chunked
< Connection: keep-alive
< via: Via: HTTP/1.1 FastAPI
< content-encoding: gzip
< vary: Accept-Encoding
<
{ [8001 bytes data]
100  8715    0  8715    0     0  1702k      0 --:--:-- --:--:-- --:--:-- 1702k
* Connection #0 to host 127.0.0.1 left intact
tempesta-nick encoding (nz-te-ce-precedence?) # diff 1 2
Binary files 1 and 2 differ

so it seems like Nginx doesn't respect neither Via header nor the fact that it's being used in proxy-mode and applies gzip to an already compressed response. The same behavior is observed when using br encoding module instead of gzip. Even though this is basically an example of Nginx misconfiguration, this is still weird.

Regarding having several compression algorithms applied to the response, besides this being just impossible to configure with Nginx, this is just nonsese to me. If a response is already compressed, any further compression won't have any significant effect on the size of that response (and if it actually does that means the algorithm used in the first place is the issue, not the proxy server itself). Such a setup would only increase the time for the client to get to the actual response data, since it also has to be decoded several times.

As a conclusion to that I would just say that in production having gzip/br along with chunked is more than enough. This is the most common setup one would use for almost any task and we perhaps should not invest much time into further investigation on this topic and trying to come up with some sophisticated corner-cases, unless it is required directly by customers.

In the scope of this issue I will add 3 test cases for https://github.com/tempesta-tech/tempesta/pull/1418:

const-t commented 2 years ago

You have an error in via hdr via: Via: HTTP/1.1 FastAPI, maybe because of this nginx ignores via? Also I can't see where nginx applies second encoding, second response has single content-encoding: gzip as the first.

const-t commented 1 year ago

Tests addressed by current issue was implemented as part of encoding tests suite.