varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.7k stars 377 forks source link

On cache Miss of request containing "Range: bytes=0-" response status is 200 (should be 206) #4081

Open imwhocodes opened 8 months ago

imwhocodes commented 8 months ago

Expected Behavior

Given this condition:

While streaming from backend the response to the client should be:

Current Behavior

Given previous precondition:

While streaming from backend the response is:

But when retrieving from cache (correctly):

Possible Solution

I tried:

sub vcl_deliver {
    if( 
        resp.is_streaming
            &&
        ( std.tolower(req.http.range) == "bytes=0-" )
            &&
        ( req.http.if-none-match != resp.http.etag )
    ){
        set resp.status = 206;
    }
}

But this seem to break the response (at least on chrome)

Steps to Reproduce (for bugs)

No response

Context

I'm using Varnish as a reverse-proxy/cache in front of an aiohttp (python) server that encode on-demand some videos using ffmpeg, I want to minimize time form request to first frame out so ffmpeg output is piped out while it being processed: ffmpeg ... -f mp4 -movflags frag_keyframe+empty_moov+default_base_moof pipe:

This is why the response from the origin/backend server is Transfer-Encoding: chunked The response is for a <video> html5 tag, 200 would be ok if not that Chrome misbehave (palyback seeking is broken) when not receiving 206 on a Range: bytes=0- request

Log

*   << BeReq    >> 3
-   Begin          bereq 2 fetch
-   VCL_use        boot
-   Timestamp      Start: 1710255589.400947 0.000000 0.000000
-   BereqMethod    GET
-   BereqURL       /<folder>/IMG_7269.MOV?h=320&w=320
-   BereqProtocol  HTTP/1.1
-   BereqHeader    Host: <ip>
-   BereqHeader    User-Agent: PostmanRuntime/7.37.0
-   BereqHeader    Accept: */*
-   BereqHeader    Postman-Token: <token>
-   BereqHeader    X-Forwarded-Host: <host>
-   BereqHeader    X-Forwarded-Server: <host>
-   BereqHeader    X-Forwarded-For: <ip>, <ip>
-   BereqHeader    Via: 1.1 48b10ba18c41 (Varnish/7.4)
-   BereqHeader    Accept-Encoding: gzip
-   BereqHeader    X-Varnish: 3
-   VCL_call       BACKEND_FETCH
-   VCL_return     fetch
-   Timestamp      Fetch: 1710255589.400959 0.000012 0.000012
-   Timestamp      Connected: 1710255589.401053 0.000105 0.000093
-   BackendOpen    31 default <ip> <ip> connect
-   Timestamp      Bereq: 1710255589.401069 0.000121 0.000015
-   BerespProtocol HTTP/1.1
-   BerespStatus   200
-   BerespReason   OK
-   BerespHeader   Etag: "FAEB-2OFl-ldkvZQ==-mp4-2832867"
-   BerespHeader   Last-Modified: Wed, 18 Oct 2023 13:11:49 GMT
-   BerespHeader   Cache-Control: public, must-revalidate, stale-while-revalidate, stale-if-error
-   BerespHeader   Content-Type: video/mp4
-   BerespHeader   Transfer-Encoding: chunked 
-   BerespHeader   Date: Tue, 12 Mar 2024 14:59:49 GMT
-   BerespHeader   Server: Python/3.10 aiohttp/3.9.3
-   Timestamp      Beresp: 1710255589.496585 0.095638 0.095516
-   TTL            RFC 120 10 0 1710255589 1710255589 1710255589 0 0 cacheable
-   VCL_call       BACKEND_RESPONSE
-   TTL            VCL 120 60 0 1710255589 cacheable
-   TTL            VCL 120 60 31536000 1710255589 cacheable
-   VCL_return     deliver
-   Timestamp      Process: 1710255589.496608 0.095660 0.000022
-   Filters
-   Storage        malloc s0
-   Fetch_Body     2 chunked stream
-   BackendClose   31 default recycle
-   Timestamp      BerespBody: 1710255589.742165 0.341217 0.245557
-   Length         795115
-   BereqAcct      392 0 392 310 795115 795425
-   End

*   << Request  >> 2
-   Begin          req 1 rxreq
-   Timestamp      Start: 1710255589.400859 0.000000 0.000000
-   Timestamp      Req: 1710255589.400859 0.000000 0.000000
-   VCL_use        boot
-   ReqStart       <ip> http
-   ReqMethod      GET
-   ReqURL         /<folder>/IMG_7269.MOV?w=320&h=320
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: <ip>
-   ReqHeader      Range: bytes=0-
-   ReqHeader      User-Agent: PostmanRuntime/7.37.0
-   ReqHeader      Accept: */*
-   ReqHeader      Postman-Token: <token>
-   ReqHeader      Accept-Encoding: gzip, deflate, br
-   ReqHeader      X-Forwarded-For: <ip>
-   ReqHeader      X-Forwarded-Host: <ip>
-   ReqHeader      X-Forwarded-Server: <ip>
-   ReqHeader      Connection: close
-   ReqUnset       X-Forwarded-For: <ip>
-   ReqHeader      X-Forwarded-For: <ip>, <ip>
-   ReqHeader      Via: 1.1 48b10ba18c41 (Varnish/7.4)
-   VCL_call       RECV
-   ReqURL         /<folder>/IMG_7269.MOV?h=320&w=320
-   VCL_return     hash
-   ReqUnset       Accept-Encoding: gzip, deflate, br
-   ReqHeader      Accept-Encoding: gzip
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 3 fetch
-   Timestamp      Fetch: 1710255589.496667 0.095807 0.095807
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Etag: "FAEB-2OFl-ldkvZQ==-mp4-2832867"
-   RespHeader     Last-Modified: Wed, 18 Oct 2023 13:11:49 GMT
-   RespHeader     Cache-Control: public, must-revalidate, stale-while-revalidate, stale-if-error
-   RespHeader     Content-Type: video/mp4
-   RespHeader     Date: Tue, 12 Mar 2024 14:59:49 GMT
-   RespHeader     Server: Python/3.10 aiohttp/3.9.3
-   RespHeader     X-Varnish: 2
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 48b10ba18c41 (Varnish/7.4)
-   RespHeader     Accept-Ranges: bytes
-   VCL_call       DELIVER
-   VCL_return     deliver
-   Timestamp      Process: 1710255589.496678 0.095818 0.000011
-   Filters         range
-   RespHeader     Connection: close
-   RespHeader     Transfer-Encoding: chunked
-   Timestamp      Resp: 1710255589.742303 0.341444 0.245625
-   ReqAcct        378 0 378 410 795115 795525
-   End

*   << Session  >> 1
-   Begin          sess 0 HTTP/1
-   SessOpen       <ip> http <ip> 1710255589.400836 28
-   Link           req 2 rxreq
-   SessClose      REQ_CLOSE 0.342
-   End

Varnish Cache version

varnish:latest (varnish-7.4.2 revision cd1d10ab53a6f6115b2b4f3b2a1da94c1f749f80)

Operating system

No response

Source of binary packages used (if any)

https://hub.docker.com/_/varnish

nigoroll commented 8 months ago

This is known, and it is unclear how this scenario could be improved at all. One of the questions is what would be a good value to send as last-pos of the Content-Range header. Also, we would need to send * instead of complete-length while streaming, as we just do not know the overall length. That said, I would suspect that browsers would not show a seek widget either or, in other words, I think that the problem would not go away with a 206 response, but rather browsers require a known length to support seeking (forward). From personal experience, I know that at least ff supports backwards seeking for streaming responses of audio (200 with chunked encoding).

dridi commented 8 months ago

I think you broke user agents because you tried to only enforce a 206, your snippet is lacking a a content-range header.

varnishtest "Emulate 206 on unbound bytes range"

server s1 {
        rxreq
        txresp -hdr "Transfer-Encoding: chunked" -hdr {etag: "123"}
        chunked hello
        delay 0.1
        chunked world
        chunkedlen 0
} -start

varnish v1 -vcl+backend {
        import std;

        sub vcl_deliver {
                if (resp.is_streaming &&
                    std.tolower(req.http.range) == "bytes=0-" &&
                    req.http.if-none-match != resp.http.etag ) {
                        set resp.status = 206;
                        set resp.http.content-range = "bytes */*";
                }
        }
} -start

client c1 {
        txreq -hdr "range: bytes=0-" -hdr {if-none-match: "abc"}
        rxresp
        expect resp.status == 206
        expect resp.http.content-range == "bytes */*"

        txreq -hdr "range: bytes=0-" -hdr {if-none-match: "def"}
        rxresp
        expect resp.status == 206
        expect resp.http.content-range == "bytes 0-9/10"
} -run

If you really really require a 206, maybe try the VCL snippet from this test case.

dridi commented 8 months ago

For consistency with the non-streaming cases, we could consider replying with a 206 in this scenario.

nigoroll commented 8 months ago

For consistency with the non-streaming cases, we could consider replying with a 206 in this scenario.

Does it help? If yes, 👍🏽

dridi commented 8 months ago

@imwhocodes did you get a chance to try the VCL workaround I suggested?

sub vcl_deliver {
        if (resp.is_streaming &&
            std.tolower(req.http.range) == "bytes=0-" &&
            req.http.if-none-match != resp.http.etag ) {
                set resp.status = 206;
                set resp.http.content-range = "bytes */*";
        }
}
imwhocodes commented 8 months ago

@imwhocodes did you get a chance to try the VCL workaround I suggested?

sub vcl_deliver {
        if (resp.is_streaming &&
            std.tolower(req.http.range) == "bytes=0-" &&
            req.http.if-none-match != resp.http.etag ) {
                set resp.status = 206;
                set resp.http.content-range = "bytes */*";
        }
}

Yes I did right after you proposed it, really appreciated!

I was waiting to answer because it didn't solve the problem but I wanted to do more in-depth testing (that I didn't had time yet to do)

As far from what I'm understanding/tested set content-range = "bytes */*"; is not valid, particularly the incl-range part can not be * as per spec: https://www.rfc-editor.org/rfc/rfc9110#field.content-range

Chrome was not happy about, it actually like it more if I set something fake like set content-range = "bytes 0-1024/*"; even if I'm sending more data than "advertise one"

Thanks

nigoroll commented 8 months ago

@imwhocodes thank you for the additional feedback. We have discussed this ticket today during bugwash and the short summary is "we could do better, but are not sure if it's worth it".

For a streaming object, the range filter could wait for at least first-pos + x bytes and then send Content-Range: bytes .../* See also #3872 for prior work in the same direction