howerj / httpc

HTTP client for embedded use - supports redirects and resume.
The Unlicense
15 stars 5 forks source link

HTTP 1.0 range support? #18

Open MikolajMarkiel opened 4 months ago

MikolajMarkiel commented 4 months ago

The code provides a Range header only to HTTP 1.0 and not for HTTP 1.1. Is that for sure correct? Shouldn't be that reversed?

    if (op == HTTPC_GET && h->os->flags & HTTPC_OPT_HTTP_1_0 && h->position && h->accept_ranges) {
        char range[64 + 1] = { 0, };
        if (httpc_buffer_add_string(h, b0, "Range: bytes=") < 0)
            goto fail;
        httpc_num_to_str(range, h->position, 10);
        if (httpc_buffer_add_string(h, b0, range) < 0)
            goto fail;
        if (httpc_buffer_add_string(h, b0, "-\r\n") < 0)
            goto fail;
    }
        case FLD_ACCEPT_RANGES:
            if (httpc_case_insensitive_search(line, "bytes")) {
                h->accept_ranges = !!(h->os->flags & HTTPC_OPT_HTTP_1_0);
                return info(h, "Accept-Ranges: bytes");
            }

Also using modified at run time Range header during retrying previous transfer after network failure cause obtaining reduced Content-Length and because of that GET request ends somewhere in middle of file. The first assign of h->length should persist from the first Content-Length response header, and shouldn't be modified during retrying. Do you agree?

howerj commented 4 months ago

I've fixed the flag, I'm not sure what you have written at the end though.

MikolajMarkiel commented 4 months ago

There are 2 flags related to HTTPC_OPT_HTTP_1_0, so one more still needs to be changed

case FLD_ACCEPT_RANGES:
    if (httpc_case_insensitive_search(line, "bytes")) {
        h->accept_ranges = !!(h->os->flags & HTTPC_OPT_HTTP_1_0);  // it should be negated once
        return info(h, "Accept-Ranges: bytes");
    }

With that change if error occurs during HTTP 1.1 GET request, httpc.c will try to retry transfer from last successful position. It will add Range header in subsequent request. Because of range header, the server will respond with lower value of Content-Length That lowered value will be assigned to h->length variable. But it should persist with previous value, because it points to last byte and closing transfer function depends on that variable. So if for example we tried to download 500kB at the end we will end with ~300kB file or something.

I reproduced the issue. Below are logs from debug session that visualize that problem:

info:1143 Program: Embeddable HTTP 1.0/1.1 client
info:1144 Version: 0.0.0
info:1145 Repo:    https://github.com/howerj/httpc
info:1146 Author:  Richard James Howe
info:1147 Email:   howe.r.j.89@gmail.com
info:1148 Options: stk=1024 tst=1 grw=1 log=1 cons=10 redirs=3 hmax=8192 sz=2176
info:1152 License: The Unlicense (public domain)
info:534 domain:    ...
info:535 port:      443
info:536 SSL:       true
...
debug:1363 state -- initial   -> open     
debug:1363 state -- open      -> send-head
debug:674 custom header 'User-Agent: http client' added
info:682 GET  request complete
debug:1363 state -- send-head -> send-body
info:1084 no callback - nothing to do
debug:1363 state -- send-body -> recv-head
info:952 HEADER: HTTP/1.1 200 OK/15
info:788 Content Length: 527360
info:754 Accept-Ranges: bytes
info:777 connection close mandatory
info:968 header done
debug:1363 state -- recv-head -> recv-body
App: Downloaded bytes: 1024
App: Downloaded bytes: 2048
App: Downloaded bytes: 3072
App: Downloaded bytes: 4096
...
App: Downloaded bytes: 210337
App: Downloaded bytes: 211361
App: Downloaded bytes: 212385
---------------------------------------------------------- network error  ---------------------------------------------------------- 
error:261 network read error -1
error:1002 read error
debug:1363 state -- recv-body -> back-off 
debug:1363 state -- back-off  -> sleeps   
info:703 backing off for 500 ms, retried 0
debug:1363 state -- sleeps    -> open     
debug:1363 state -- open      -> back-off 
debug:1363 state -- back-off  -> sleeps   
info:703 backing off for 1000 ms, retried 1
debug:1363 state -- sleeps    -> open     
debug:1363 state -- open      -> back-off 
debug:1363 state -- back-off  -> sleeps   
info:703 backing off for 2000 ms, retried 2
debug:1363 state -- sleeps    -> open     
debug:1363 state -- open      -> send-head
debug:674 custom header 'User-Agent: http client' added
info:682 GET  request complete
debug:1363 state -- send-head -> send-body
info:1084 no callback - nothing to do
debug:1363 state -- send-body -> recv-head
info:952 HEADER: HTTP/1.1 206 Partial Content/28
info:788 Content Length: 314975                                         <-- new Content Length
info:810 unknown field: Content-Range: bytes 212385-527359/527360       <-- successful applied Range
info:754 Accept-Ranges: bytes
info:777 connection close mandatory
info:968 header done
debug:1363 state -- recv-head -> recv-body
App: Downloaded bytes: 213409
App: Downloaded bytes: 214433
App: Downloaded bytes: 215457
...
App: Downloaded bytes: 313088
App: Downloaded bytes: 314112
App: Downloaded bytes: 315136             <-- last packet should be at 527360 byte. It ended there because of new Content Length value
debug:1363 state -- recv-body -> done

I tried to do something like that and it worked fine for me. But I don't know if it could be useful also for other cases.

case FLD_CONTENT_LENGTH:
    if (h->length_set == 0){
        if (httpc_scan_number(&line[fld->length], &h->length, 10) < 0)
            return error(h, "invalid content length: %s", line);
        h->length_set = 1;
        return info(h, "Content Length: %lu", (unsigned long)h->length);
    }
    return info(h, "Content Length remains from previous request: %lu", (unsigned long)h->length);

and also needed to remove these (they probably could be in initial step before open):

static int httpc_parse_response_header(httpc_t *h, httpc_buffer_t *b0) {
    h->v1 = 0;
    h->v2 = 0;
    os->response = 0;
-   h->length = 0;         <-- remove
    h->identity = 1;
-   h->length_set = 0;   <-- remove
    h->keep_alive = !(os->flags & HTTPC_OPT_HTTP_1_0);
    h->accept_ranges = !(os->flags & HTTPC_OPT_HTTP_1_0);
    b0->used = 0;

and lastly... if server respond with Transfer-encoding: identity then h->position will be reset. The solution above doesn't handle that, so there could be some issues. I would suggest to reset the position at fault only with server that does not accept ranges or with HTTP 1.0. Then we are downloading from 0 byte but already downloaded packets are discarded (as you provided in response body parser) and position is updated like it should to. Isn't that correct?

howerj commented 4 months ago

Right you are about the second flag, I've pushed a fix for that. Thanks for writing up a detailed report, I think you are right, but it will take more time for me to analyze this which I do not have at the moment. I'll get around to looking at this properly sometime, I'd prefer not to make changes that could potentially break anything else.

I'd keep a local/personal of the other changes you've made. I'm not planning on making massive changes to this library any time soon so any personal branch would not go stale.