Closed p-ranav closed 3 years ago
There are a few different things here.
ah are relatively expensive to allocate, on a linux box they're something over 4KB by default. So we don't want them to be treated as immortal and get all our resources used up, or have them leak. So the immediate thing about "excessive" ah allocation time is a sanity check to detect and protect against that. If in your case an ah being allocated for (ie, an http transaction living for) 1h is reasonable, you can set the ah timeout accordingly, and end of that problem; all of the fails on "excessive hold" in your table will just work then.
ah used to be guarded more jealously before h2, held in a pool and connections made to wait to acquire a free one, etc. With h2, one connection may demand many ah immediately and we either have to come up with them right away, or fail the stream. So with h2, no choice but to be more relaxed about it and allocate them on demand. They are still protected from leaking.
this is not the connection timeout, this is a separate sanity check about protecting ah over and above the wsi timeout. You should set it to be bigger the longest time any http transaction should ever reasonably be allowed to live on your server under any conditions, 10 seconds, 10 hours, depends on your policy about it. Normally, you should never see it fire, but eg, slowloris type attacks will trigger it.
However, if the connection failed for some reason, the process waits for the 1 hour timeout to end before I am notified with LWS_CALLBACK_CLOSED_CLIENT_HTTP
wsi also have their own individual timeout. If you are serving a file, once it gets past the other steps to set up the transaction, lws should reset the timeout to what you set in info->timeout_secs
every time it sends a chunk to the client, ie, it lets it live another 15s by default each time it sends something. If the client stops accepting things, it should time it out after info->timeout_secs
for that reason, nothing to do with ah monitoring.
Hi @lws-team,
@p-ranav is a collegue of mine that is working on this issue from our side. But I am quite confused about what this answer here.
Isn't the allocated_headers (https://github.com/warmcat/libwebsockets/blob/v4.2-stable/lib/roles/http/private-lib-roles-http.h#L104 ) a library "concept"? As a user of the library, why should the user have to know about this depending on the network speed?
From what I can see, the reason that it becomes a "total lifetime" is that the ah->assigned
is assigned during first connect and from that point on compared with the current time.
https://github.com/warmcat/libwebsockets/blob/v4.2-stable/lib/roles/http/header.c#L592
ah = pt->http.ah_list;
while (ah) {
int len;
char buf[256];
const unsigned char *c;
if (!ah->in_use || !ah->wsi || !ah->assigned ||
(ah->wsi->a.vhost &&
(now - ah->assigned) <
ah->wsi->a.vhost->timeout_secs_ah_idle + 360)) {
ah = ah->next;
continue;
}
Finally ending up here https://github.com/warmcat/libwebsockets/blob/v4.2-stable/lib/roles/http/header.c#L619
lwsl_notice("%s: ah excessive hold: wsi %p\n"
" peer address: %s\n"
" ah pos %lu\n", __func__, lws_wsi_tag(wsi),
buf, (unsigned long)ah->pos);
When we have sucessfully recieved a chunk of data from the server, would you consider it bad practice or error-phrone to reset the ah->assigned
to the current time?
From my perspective it would seem like the correct thing to do, since we know at that point that the connection is still valid and we do not want it to be interrupted. In fact, what is the rationale for libwebsockets not to do so?
Looking forward to hearing why I am wrong on this :)
Best regards, Rikard
As a user of the library, why should the user have to know about this depending on the network speed?
... because stuff times out? Is that really such a novel concept?
When we have sucessfully recieved a chunk of data from the server, would you consider it bad practice or error-phrone to reset the ah->assigned to the current time? Looking forward to hearing why I am wrong on this :)
It's something you would think about if you did not read what I wrote in my reply properly, especially
If in your case an ah being allocated for (ie, an http transaction living for) 1h is reasonable, you can set the ah timeout accordingly wsi also have their own individual timeout.
Also you are using a version of lws that is several years old. If you want support please use a recent version.
@lws-team
For the sake of argument I've updated the links to the latest stable version libwebsockets in my previous comment (even thou relevant parts of the code is mostly identical). And yes, we will also update to the latest version, as you say it is way passed its bedtime ^^-
.. because stuff times out? Is that really such a novel concept?
Yes things timeout, timeouts itself is definitely not a novel concept. But I think we are misinterpreting each other here. We can probably at least agree that this is not clear as to why it exists and how users (such as myself) are supposed to use it. It definitely is not clear to me.
Lets make a hypothetical example of an infinite data stream that you receive over https. For example the video stream of a camera. Something that by definition have no timeout, except maybe infinity. Could libwebsockets handle that using the https protocol today?
I think sadly due to an implementation detail, the answer is no. So why is a total lifetime timeout of the allocated_headers
enforced in all cases?
The opposite is of course also true, there are definitely situations where you would want such a timeout, but certainly not always.
It's something you would think about if you did not read what I wrote in my reply properly, especially
If in your case an ah being allocated for (ie, an http transaction living for) 1h is reasonable, you can set the ah timeout accordingly wsi also have their own individual timeout.
I assume that what you are saying is that we should set both these timeouts?
Best regards, Rikard
In the case of ah timeout, it is not a fixed number but user-settable.
How long is reasonable for an http connection to hold an ah depends on your use-case. If it's different from the default, set it to what suits your use-case.
OP is saying "I set it to 25m, when my connections took longer than 25m, trouble!" Yes... so don't set it to 25m which is inside what you evidently consider a reasonable transaction time, figure out what is a reasonable limit to your use case and set it a bit beyond that, so it only closes connections that you also think are unreasonably long-lived. You are running a server, so you must defend it against immortal connections and resource exhaustion.
It is not that complicated it needs a tag team and the answer repeating.
I assume that what you are saying is that we should set both these timeouts?
What I said is in response to
However, if the connection failed for some reason, the process waits for the 1 hour timeout to end before I am notified with
There is another timeout dedicated to the wsi. If it is files you are serving, as I explained the approach is to keep setting the timeout a few seconds (you can also control this duration) ahead each time we send something. So stalled connections should not act as described, they should give up after 15s of not being able to send any more by default on main / v4.2 branch. It's broken on current stuff? Show me how to reproduce it and I will study it.
For a server I would agree.
In this use case we are the client not the server, we are not hosting the files we are retrieving the files. Therefor it seems very odd that we would ever want our download aborted due to it taking "too long".
Well, I wrongly assumed it was serving.
But it's still the case you don't want to tie up heap beyond what is reasonable for what it is trying to do, for client, lws is used on resource-constrained targets like ESP32 where the tls tunnel costs 30%-40% of available heap.
Advice is the same, set it to a number in line with what is reasonable for your use-case, and update lws to see about the other wsi timeout problem.
That would depend on the use-case even for an ESP32.
The camera stream is again the perfect counter-example for this. It would cost more to allocate/de-allocate every X timeout than simply allocating once at startup and then keep going.
Now that I understand your point of view better, I could form a patch that adds a hash-define for enabling the behavior I've been referring to, while keeping the library's current behavior as its default.
For example something that starts like this (I would also remove variables and members that would become unused from this hash-define).
if (!ah->in_use || !ah->wsi || !ah->assigned ||
(ah->wsi->a.vhost &&
#if !defined(LWS_WITHOUT_TIMEOUT_AH_SECS)
(now - ah->assigned) <
ah->wsi->a.vhost->timeout_secs_ah_idle + 360)
#endif
) {
ah = ah->next;
continue;
}
Whether or not you want that feature in your library is up to you.
Best regards, Rikard
Hello,
Thank you for this library. I am currently working on a HTTPS file downloader using libwebsockets and have hit a sort of hurdle.
lws_protocols
ishttps
lws_context_creation_info.timeout_secs_ah_idle
ah excessive hold: wsi 0xa1adf0
lws_callback_reasons = LWS_CALLBACK_CLOSED_CLIENT_HTTP
is handledLWS_CALLBACK_CLOSED_CLIENT_HTTP
Here is a summary table of my observations
So, I need some guidance here. I am okay with setting this timeout to some large value (to correctly download even in very slow networks). That said, I would also like to be able to detect connection problems early without having to wait for the timeout to expire. Can you guide me regarding how I can achieve this with the library?
Thank you.