Performance: Response.content is unnecessarily slow

bmerry commented 4 years ago

The core of Response.content looks like this (where CONTENT_CHUNK_SIZE is 10KB):

self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''

That is suboptimal for several reasons:

All the data has to be read into a temporary bytes, then copied into the joined buffer.
It's also memory-inefficient: the CPython implementation of bytes.join first converts the generator to a sequence, so if the content is 1GB, you will temporarily have 2GB of memory used.
10KB isn't really big enough to amortise all the overheads (increasing it significantly improves performance).

It looks like this used to be done with self.raw.read, but it was changed to the current approach 8 years ago. I've tried a quick test to switch back to self.raw.read(decode_content=True), but it's failing some unit tests, presumably because of subtleties in handling Content-Encoding. If the maintainers agree that this is worth pursuing then I can work on dealing with the corner cases to make a PR.

Expected Result

I expect resp.content from a non-streamed request to have similar performance to resp.raw.read() from a streamed request.

Actual Result

I've benchmarked response.content at 590 MB/s and response.raw.read()see sample code below) at 3180 MB/s — 5.4x faster. With 10-25 Gb/s networking becoming pretty standard in the data centre, this represents a significant bottleneck.

def load_requests_naive(url: str) -> bytes:
    with requests.get(url) as resp:
        return resp.content

def load_requests_stream(url: str) -> bytes:
    with requests.get(url, stream=True) as resp:
        return resp.raw.read()

Reproduction Steps

You'll need to run an HTTP server that can deliver a large file at high bandwidth (I happen to have Minio+Varnish on my local machine, but I'm sure other servers e.g. Apache could be used). Then run the script below as httpbench-requests.py all http://.... Note that Python 3.8 (or possibly it was 3.7) improved the performance of http.client.HTTPResponse.read, so if you use an older Python version the difference in performance is less enormous, but still >2x on my machine.

#!/usr/bin/env python

import argparse
import gc
import hashlib
import http.client
import io
import socket
import textwrap
import time
import urllib.parse
from typing import Callable, Tuple, Optional

import requests
import numpy as np

_Method = Callable[[str], bytes]
METHODS = {}

def method(name: str) -> Callable[[_Method], _Method]:
    def decorate(func: _Method) -> _Method:
        METHODS[name] = func
        return func

    return decorate

@method('requests-naive')
def load_requests_naive(url: str) -> bytes:
    with requests.get(url) as resp:
        return resp.content

@method('requests-stream-read')
def load_requests_stream(url: str) -> bytes:
    with requests.get(url, stream=True) as resp:
        return resp.raw.read()

def measure_method(method: str, args: argparse.Namespace) -> None:
    rates = []
    for i in range(args.passes):
        gc.collect()
        start = time.monotonic()
        data = METHODS[method](args.url)
        stop = time.monotonic()
        elapsed = stop - start
        rates.append(len(data) / elapsed)
        del data
    mean = np.mean(rates)
    std = np.std(rates) / np.sqrt(args.passes - 1)
    print('{}: {:.1f} ± {:.1f} MB/s'.format(method, mean / 1e6, std / 1e6))

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--passes', type=int, default=5)
    parser.add_argument('method')
    parser.add_argument('url')
    args = parser.parse_args()
    if args.method not in METHODS and args.method != 'all':
        parser.error('Method must be "all" or one of {}'.format(set(METHODS.keys())))

    if args.method == 'all':
        for method in METHODS:
            measure_method(method, args)
    else:
        measure_method(args.method, args)

if __name__ == '__main__':
    main()

System Information

$ python -m requests.help

{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.9"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.8.0"
  },
  "platform": {
    "release": "5.4.0-37-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.24.0"
  },
  "system_ssl": {
    "version": "1010100f"
  },
  "urllib3": {
    "version": "1.25.9"
  },
  "using_pyopenssl": false
}

GoddessLuBoYan commented 4 years ago

so, use BytesIO is better?

bmerry commented 4 years ago

so, use BytesIO is better?

Are you suggesting replacing ''.join with BytesIO for joining together all the 10KB pieces? It won't avoid having two copies of all the data around at once because BytesIO.getvalue makes a copy. I haven't measured the performance but I'd be surprised if it's any better.

GoddessLuBoYan commented 4 years ago

yeah

psf / requests