Experiment with a HTTP::Body replacement, or optimizing HTTP::Body

avar commented 9 years ago

Just filing a quick braindump related to issue #496:

This is a percentile graph (can't show data on the X & Y axis, unfortunately) of a live system I maintain which takes URL encoded POST requests of a couple of parameters, between 100 bytes and 50K or in size.

The timing going down is when I switched from using Plack's body_parameters to my own URI::Escape::XS-powered replacement:

my %post_body = map {
    my ($type, $escaped_value) = $_ =~ /^([^=]+)=(.*)$/s;
    # Note that there's a potential bug here, we don't
    # uri_unescape() the $type, but for the data we're using this
    # for the keys don't need escaping.
    +($type => uri_unescape($escaped_value));
} split /&/, $post_body;

Most of that time was spent in HTTP::Body::UrlEncoded, here's a flame graph of where Plack was spending its time.

Some things to look at:

HTTP::MultiPartParser is a faster MIME parser that hasn't been integrated into HTTP::Body
Ditto URL::Encode and URL::Encode::XS.
When Plack reads content it does so 8K at a time. I'm reading all of it at once. Is there a good reason for this internal buffering before it calls HTTP::Body ->add()? We could be more aggressive about buffering here and get rid of more object overhead.
FWIW I removed the whole spin logic ($read == 0 && $spin++ > 2000) in my implementation and haven't run into a case where read was 0 once, and don't see how it would make sense to read 0 bytes but then get non-zero on retries.

miyagawa commented 9 years ago

When Plack reads content it does so 8K at a time. I'm reading all of it at once. Is there a good reason for this internal buffering before it calls HTTP::Body ->add()? We could be more aggressive about buffering here and get rid of more object overhead.

Haven't looked at the details, but if the psgix.input.buffered is true, there's no reason to do chunking read - it's already buffered by the web server.

IIRC you are using uwsgi as a web server and that server does not pre-read the content though?

doy commented 9 years ago

Pretty sure a properly timed signal could cause read to return 0 bytes even if there are more things to read

miyagawa commented 9 years ago

Yeah it's basically a guard against EAGAIN on a non-buffered reads on some web servers.

avar commented 9 years ago

uWSGI has an option to pre-buffer everything. I'm not using that in that particular application though.

Regardless of what the webserver is doing it would be interesting to play with doing this on the Perl level, pseudocode:

my $content = $buffer->read_all;
my $body = HTTP::Body->new($content);

Instead of the current:

my $body = HTTP::Body->new;
while (my $chunk = n_at_a_time($body, 8192)) {
    $body->add($chunk);
}

I.e. when we're say parsing 8MB we're doing a lot of calls to HTTP::Body which needs to incrementally parse the body, as opposed to doing it all at once. Maybe deferring that work at the cost of some memory is better, I don't know, but something worth playing with.

I forgot about that EGAIN case, good point.

miyagawa commented 8 years ago

I'm fine with increasing the buffer size to something like 1MB, but what is your usual buffer size for the url-encoded HTTP post body? 8K sounds fairly big for regular posts, but i might be wrong.

avar commented 8 years ago

Just a quick reply because I don't have time to look up all the relevant config variables.

There's a uWSGI-level config valuable that slurps up the entire POST buffer regardless of size (you do size limitations via nginx in front of it).

So what happens is that if you have a 10MB POST you have a ~10MB SV in the perl interpreter.

What I was lamenting in a previous comment was that all this streaming HTTP parsing interface is accomplishing is effectively doing a misguided substr() loop through this 10MB buffer.

So in certain cases, however big the substr() buffer is the whole operation is pointless, we're trying to "stream" something we already have entirely in memory. So for the purposes of Plack we should check whatever the psgix.* variable was for "we're doing eager slurping". We should just construct an object with the entire SV we have already.

Aside from that even when we're streaming I wouldn't be surprised if this entire thing is a misguided premature optimization. If you're accepting X MB POST requests you're going to have X MB in some perl process anyway..., but maybe there are compelling use-cases for this feature.

miyagawa commented 8 years ago

Oh well I'm sure it makes sense to increase (or not even doing chunk at all) for a few MB POST that has already been buffered on perl level. I was talking about a regular HTTP POST use-case, because you posted one image with UrlEncoded body frame, which I imagine the body size should be much smaller.

avar commented 8 years ago

Ah, those are in the 20-200K range usually. It's a bit of an odd use-case. I'm manually constructing the body request with Hijk and manually splitting it on the other end. It's just URI encoded because that was a quick thing to do and relatively fast.

miyagawa commented 8 years ago

Do you have your diff in handy? I'd like to compare that with my patch to see if I've missed anything.

avar commented 8 years ago

I never patched Plack directly to hack how it does this stuff, I just used ->input to get the raw input buffer, then used the %post_body snippet above.

plack / Plack

Experiment with a HTTP::Body replacement, or optimizing HTTP::Body #497