Poco Multipart parsing is 10x slower than its Boost/beat or restinio equivalent

datinje commented 1 year ago

This is not an issue (well .. up to you - since there is a second way to implement with poco as describe below) , but Is there a reason for Poco checking each element of a REST multi part stream , consequence is 10x performance drop vs Boost/Beast or restinio ?

details : We have found one bottleneck in the built-in Poco::Net::MultipartReader : it keeps calling a method that basically does a ‘read_new_line’ for consuming all bytes in the payload of every parts. This is slow because every character is compared to ‘\r’ ‘\n’ and ‘\0’. For a 500bytes Jason part + 13Mbytes binary image part , performance drop is 10x versus Beast.

Actual code from poco:

std::streambuf& buf = _istr.rdbuf(); int n = 0; int ch = buf.sgetc(); while (ch != eof && ch != ‘\r’ && ch != ‘\n’ && n < length) { buffer++ = (char) buf.sbumpc(); ++n; ch = buf.sgetc(); } If (ch == eof) _lastPart = true; return n;

As a note , in our test, we always provide the content-length of the part in headers after the boundary. This allows a more efficient algorithm, see the activity diagram attached. Besides clients adhere more to the standard if doing so.

Fixing this is very easy, thanks to the std::stream API that poco provides, and our test resulted in similar speed for poco and beast.

After this change perf in our test is on par with Boost/Beast implementation .

So can you explain

datinje commented 1 year ago

sorry finishing : so can you explain the reason of this element by element check ?

obiltschnig commented 1 year ago

At least in emails nobody seems to send a Content-Length in the part headers, so we have to read line-by-line in order to detect the boundary. However, it makes sense to add an optimization for the case a Content-Length is provided.

datinje commented 1 year ago

So I understand the reason of this check was to for the code to be more general. And Your improvement is going to improve perf by 10x for this specific case. Thanks a lot ! A big thank to my colleague Thibault Pierre who found that out in his tests and to @obiltschnig for the quick reaction !

aleks-f commented 1 year ago

@datinje do you intend to send this as PR? 1.13.0 is scheduled for Monday

matejk commented 2 weeks ago

@datinje, can you create a PR for this improvement, please?

Poco 1.14 is going to be released soon and the PR could get included there.

pocoproject / poco

Poco Multipart parsing is 10x slower than its Boost/beat or restinio equivalent #4118