warmcat / libwebsockets

canonical libwebsockets.org networking library
https://libwebsockets.org
Other
4.77k stars 1.49k forks source link

Need guidance regarding using HTTP client in slow networks to download files #2279

Closed p-ranav closed 3 years ago

p-ranav commented 3 years ago

Hello,

Thank you for this library. I am currently working on a HTTPS file downloader using libwebsockets and have hit a sort of hurdle.

Here is a summary table of my observations

Attempt Description Idle Timeout Connection Interrupted? Network Throttled?NetLimiterDownload Speed Download Successful? Notes
1 Download the file in gigabit network 5 mins     YES Best case scenario - Fast, reliable network
2 Start download and disable network 5 mins YES   NO Timeout correctly at 5 mins
4 Limit the network download speed to 900 KB/s 5 mins   YES - 900 KB/s YES Download took 2.391𝑚𝑖𝑛 (143.46 secs)
5 Limit the network download speed to 700KB/s 5 mins   YES - 700 KB/s YES Download took 3.757𝑚𝑖𝑛 (225.43 secs)
6 Limit the network download speed to 500 KB/s 5 mins   YES - 500 KB/s NO ah excessive hold: wsi 0x1884df0
7 Limit the network download speed to 150 KB/s 25 mins   YES - 150 KB/s YES Download took 14.876𝑚𝑖𝑛 (892.56 secs)
8 Limit the network download speed to 100 KB/s 25 mins   YES - 100 KB/s NO ah excessive hold: wsi 0xa1adf0Timeout at 25.994𝑚𝑖𝑛

So, I need some guidance here. I am okay with setting this timeout to some large value (to correctly download even in very slow networks). That said, I would also like to be able to detect connection problems early without having to wait for the timeout to expire. Can you guide me regarding how I can achieve this with the library?

Thank you.

lws-team commented 3 years ago

There are a few different things here.

However, if the connection failed for some reason, the process waits for the 1 hour timeout to end before I am notified with LWS_CALLBACK_CLOSED_CLIENT_HTTP

wsi also have their own individual timeout. If you are serving a file, once it gets past the other steps to set up the transaction, lws should reset the timeout to what you set in info->timeout_secs every time it sends a chunk to the client, ie, it lets it live another 15s by default each time it sends something. If the client stops accepting things, it should time it out after info->timeout_secs for that reason, nothing to do with ah monitoring.

soderstrom-rikard commented 3 years ago

Hi @lws-team,

@p-ranav is a collegue of mine that is working on this issue from our side. But I am quite confused about what this answer here.

Isn't the allocated_headers (https://github.com/warmcat/libwebsockets/blob/v4.2-stable/lib/roles/http/private-lib-roles-http.h#L104 ) a library "concept"? As a user of the library, why should the user have to know about this depending on the network speed?

From what I can see, the reason that it becomes a "total lifetime" is that the ah->assigned is assigned during first connect and from that point on compared with the current time. https://github.com/warmcat/libwebsockets/blob/v4.2-stable/lib/roles/http/header.c#L592

    ah = pt->http.ah_list;
    while (ah) {
        int len;
        char buf[256];
        const unsigned char *c;

        if (!ah->in_use || !ah->wsi || !ah->assigned ||
            (ah->wsi->a.vhost &&
             (now - ah->assigned) <
             ah->wsi->a.vhost->timeout_secs_ah_idle + 360)) {
            ah = ah->next;
            continue;
        }

Finally ending up here https://github.com/warmcat/libwebsockets/blob/v4.2-stable/lib/roles/http/header.c#L619

        lwsl_notice("%s: ah excessive hold: wsi %p\n"
                "  peer address: %s\n"
                "  ah pos %lu\n", __func__, lws_wsi_tag(wsi),
                buf, (unsigned long)ah->pos);

When we have sucessfully recieved a chunk of data from the server, would you consider it bad practice or error-phrone to reset the ah->assigned to the current time?

From my perspective it would seem like the correct thing to do, since we know at that point that the connection is still valid and we do not want it to be interrupted. In fact, what is the rationale for libwebsockets not to do so?

Looking forward to hearing why I am wrong on this :)

Best regards, Rikard

lws-team commented 3 years ago

As a user of the library, why should the user have to know about this depending on the network speed?

... because stuff times out? Is that really such a novel concept?

When we have sucessfully recieved a chunk of data from the server, would you consider it bad practice or error-phrone to reset the ah->assigned to the current time? Looking forward to hearing why I am wrong on this :)

It's something you would think about if you did not read what I wrote in my reply properly, especially

If in your case an ah being allocated for (ie, an http transaction living for) 1h is reasonable, you can set the ah timeout accordingly wsi also have their own individual timeout.

Also you are using a version of lws that is several years old. If you want support please use a recent version.

soderstrom-rikard commented 3 years ago

@lws-team

For the sake of argument I've updated the links to the latest stable version libwebsockets in my previous comment (even thou relevant parts of the code is mostly identical). And yes, we will also update to the latest version, as you say it is way passed its bedtime ^^-

.. because stuff times out? Is that really such a novel concept?

Yes things timeout, timeouts itself is definitely not a novel concept. But I think we are misinterpreting each other here. We can probably at least agree that this is not clear as to why it exists and how users (such as myself) are supposed to use it. It definitely is not clear to me.

Lets make a hypothetical example of an infinite data stream that you receive over https. For example the video stream of a camera. Something that by definition have no timeout, except maybe infinity. Could libwebsockets handle that using the https protocol today?

I think sadly due to an implementation detail, the answer is no. So why is a total lifetime timeout of the allocated_headers enforced in all cases?

The opposite is of course also true, there are definitely situations where you would want such a timeout, but certainly not always.

It's something you would think about if you did not read what I wrote in my reply properly, especially

If in your case an ah being allocated for (ie, an http transaction living for) 1h is reasonable, you can set the ah timeout accordingly wsi also have their own individual timeout.

I assume that what you are saying is that we should set both these timeouts?

Best regards, Rikard

lws-team commented 3 years ago

In the case of ah timeout, it is not a fixed number but user-settable.

How long is reasonable for an http connection to hold an ah depends on your use-case. If it's different from the default, set it to what suits your use-case.

OP is saying "I set it to 25m, when my connections took longer than 25m, trouble!" Yes... so don't set it to 25m which is inside what you evidently consider a reasonable transaction time, figure out what is a reasonable limit to your use case and set it a bit beyond that, so it only closes connections that you also think are unreasonably long-lived. You are running a server, so you must defend it against immortal connections and resource exhaustion.

It is not that complicated it needs a tag team and the answer repeating.

I assume that what you are saying is that we should set both these timeouts?

What I said is in response to

However, if the connection failed for some reason, the process waits for the 1 hour timeout to end before I am notified with

There is another timeout dedicated to the wsi. If it is files you are serving, as I explained the approach is to keep setting the timeout a few seconds (you can also control this duration) ahead each time we send something. So stalled connections should not act as described, they should give up after 15s of not being able to send any more by default on main / v4.2 branch. It's broken on current stuff? Show me how to reproduce it and I will study it.

soderstrom-rikard commented 3 years ago

For a server I would agree.

In this use case we are the client not the server, we are not hosting the files we are retrieving the files. Therefor it seems very odd that we would ever want our download aborted due to it taking "too long".

lws-team commented 3 years ago

Well, I wrongly assumed it was serving.

But it's still the case you don't want to tie up heap beyond what is reasonable for what it is trying to do, for client, lws is used on resource-constrained targets like ESP32 where the tls tunnel costs 30%-40% of available heap.

Advice is the same, set it to a number in line with what is reasonable for your use-case, and update lws to see about the other wsi timeout problem.

soderstrom-rikard commented 3 years ago

That would depend on the use-case even for an ESP32.

The camera stream is again the perfect counter-example for this. It would cost more to allocate/de-allocate every X timeout than simply allocating once at startup and then keep going.

Now that I understand your point of view better, I could form a patch that adds a hash-define for enabling the behavior I've been referring to, while keeping the library's current behavior as its default.

For example something that starts like this (I would also remove variables and members that would become unused from this hash-define).


         if (!ah->in_use || !ah->wsi || !ah->assigned ||
            (ah->wsi->a.vhost &&
#if !defined(LWS_WITHOUT_TIMEOUT_AH_SECS)
             (now - ah->assigned) <
             ah->wsi->a.vhost->timeout_secs_ah_idle + 360)
#endif
             ) {
            ah = ah->next;
            continue;
        }

Whether or not you want that feature in your library is up to you.

Best regards, Rikard