Mocking large file downloads

gitpushdashf commented 3 years ago

Is there a way to mock a response that's a large file download, such as a 100MB tarball?

I guess in theory the file could be loaded into memory, but it might be better to give a path that's read on the fly for the request.

jamielennox commented 3 years ago

So the body= response parameter is an io.IOBase, so you could just create a file handle and pass it back through body. I've never actually tried this.

My bigger question is why you would want to? Typically the library is used for unit testing and you could easily prove the point without loading a 100MB file through the application?

Personally if the only file that makes sense is this 100MB one (file system image or whatever), i'd split the testing to make sure the download works separately to the extraction.

gitpushdashf commented 3 years ago

Interesting, thank you. In this case it's a necessary file at runtime and we need to do some tests with it, but also need 100% test coverage. So if we can cache the download it'll help quite a bit.

Will have to try something like:

with open("somebigfile.tar.gz", "rb") as fp:
    with requests_mock.Mocker() as m:
        m.get('http://test.com/somebigfile.tar.gz', body=fp)

gitpushdashf commented 3 years ago

If I do: m.get('http://test.com/somebigfile.tar.gz', body=fp), I get a 0 length file at the other end.

If I do m.get('http://test.com/somebigfile.tar.gz', content=fp.read()), it seems to work.

Seems that there may be a bug with body.

jamielennox commented 3 years ago

Ah, that would make sense. It's kind of a bug, but i'm not sure what the solve is.

When you do content=fp.read() requests knows the Content-Length to set in the header. When you just say body=fp it doesn't know how long the fp is to put that in the header.

I don't know if there's a reliable or good way to default the body parameter length as i think most people would use it for more streaming purposes. You could do like a stat and add the length to header yourself, but i don't think there's an automatic fix here.

gitpushdashf commented 3 years ago

Interesting. You don't have to have a Content-Length header, though. At least as far as HTTP goes, I don't know if mocking requests changes something there.

tbryan314 commented 2 years ago

@jamielennox: so you could just create a file handle and pass it back through body. I've never actually tried this.

FYI - I just tried it with requests==2.28.1 and requests-mock==1.9.3, and I don't think that passing an open file handle as body works.

I haven't tried creating a minimal example, yet, but here's my test fixture.

@pytest.fixture
def requests_mock_with_labs(requests_mock):
    # These work fine.
    requests_mock.get(TEST_URL + "/api/v0/system_information", json={"version": CURRENT_VERSION, "ready": True})
    requests_mock.get(TEST_URL + "/api/v0/authok", text=None)
    requests_mock.post(TEST_URL + "/api/v0/authenticate", json="BOGUS_TOKEN")
    # This one causes the error below when the unit-under-test makes the corresponding GET request.
    requests_mock.get(TEST_URL + "/api/v0/labs", body = open("tests/test_data/labs.json", "r"))

My other mock requests work fine, but the GET request to /api/v0/labs is raising this error:

TypeError: sequence item 0: expected a bytes-like object, str found

The stack trace is

  File "MY_VENV/lib/python3.9/site-packages/requests/models.py", line 836, in iter_content
    raise StreamConsumedError()
  File "MY_VENV/lib/python3.9/site-packages/requests/models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
  File "MY_VENV/lib/python3.9/site-packages/requests/sessions.py", line 745, in send
    r.content
  File "MY_VENV/lib/python3.9/site-packages/requests_mock/mocker.py", line 144, in _fake_send
    return _original_send(session, request, **kwargs)
  File "MY_VENV/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "MY_VENV/lib/python3.9/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
... Everything else is my test code or the pytest runner code.

Here's the debug output from pytest when my unit-under-test attempts to make that GET request:

lib/python3.9/site-packages/requests/sessions.py:600: in get
    return self.request("GET", url, **kwargs)
        kwargs     = {'allow_redirects': True}
        self       = <requests.sessions.Session object at 0x102ee40a0>
        url        = 'https://0.0.0.0/api/v0/labs'
lib/python3.9/site-packages/requests/sessions.py:587: in request
    resp = self.send(prep, **send_kwargs)
        allow_redirects = True
        auth       = None
        cert       = None
        cookies    = None
        data       = None
        files      = None
        headers    = None
        hooks      = None
        json       = None
        method     = 'GET'
        params     = None
        prep       = <PreparedRequest [GET]>
        proxies    = {}
        req        = <Request [GET]>
        self       = <requests.sessions.Session object at 0x102ee40a0>
        send_kwargs = {'allow_redirects': True, 'cert': None, 'proxies': OrderedDict(), 'stream': False, ...}
        settings   = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'verify': True}
        stream     = None
        timeout    = None
        url        = 'https://0.0.0.0/api/v0/labs'
        verify     = None
lib/python3.9/site-packages/requests_mock/mocker.py:144: in _fake_send
    return _original_send(session, request, **kwargs)
        __pydevd_ret_val_dict = {'_set_method': None}
        _fake_get_adapter = <function MockerCore.start.<locals>._fake_get_adapter at 0x10508d940>
        kwargs     = {'allow_redirects': True, 'cert': None, 'proxies': OrderedDict(), 'stream': False, ...}
        request    = <PreparedRequest [GET]>
        self       = <requests_mock.mocker.Mocker object at 0x1050bc220>
        session    = <requests.sessions.Session object at 0x105087610>
lib/python3.9/site-packages/requests/sessions.py:745: in send
    r.content
        __pydevd_ret_val_dict = {'<listcomp>': [], 'Adapter.send': <Response [200]>, 'dispatch_hook': <Response [200]>, 'extract_cookies_to_jar': None}
        adapter    = <requests_mock.adapter.Adapter object at 0x1050bc3a0>
        allow_redirects = True
        elapsed    = 117.01323080062866
        gen        = <generator object SessionRedirectMixin.resolve_redirects at 0x10509b6d0>
        history    = []
        hooks      = {'response': [<bound method TokenAuth.handle_401_unauthorized of <TokenAuth object at 0x105087760>>]}
        kwargs     = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'timeout': None, ...}
        r          = <Response [200]>
        request    = <PreparedRequest [GET]>
        self       = <requests.sessions.Session object at 0x105087610>
        start      = 1661804972.855191
        stream     = False
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Response [200]>

    @property
    def content(self):
        """Content of the response, in bytes."""

        if self._content is False:
            # Read the contents.
            if self._content_consumed:
                raise RuntimeError("The content for this response was already consumed")

            if self.status_code == 0 or self.raw is None:
                self._content = None
            else:
>               self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
E               TypeError: sequence item 0: expected a bytes-like object, str found

self       = <Response [200]>

lib/python3.9/site-packages/requests/models.py:899: TypeError

I've confirmed that requests_mock.Adapter.send(self, request, **kwargs) matches the expected mock call, so I have the right adapter in requests.session.Session.send(self, request, **kwargs) before we get to this r.content property accessor call on line 745 of requests/sessions.py.

        if not stream:
            r.content

I've never looked at the internals of requests before, but I guess mock_requests can simply set body to a file-like object.

tbryan314 commented 2 years ago

And instead of trying to use body, I just used your "fileLoader" suggestion here: https://stackoverflow.com/a/65299837/5374843. That worked perfectly well.

jamielennox commented 2 years ago

Great. fileLoader isn't special at all, it just reads the whole file into memory. As above, unless you're doing streaming requests will always download the entire response into memory in your application so you basically will end up doing the same thing.

If you wrapped it into a callback it will only get loaded on demand. This is useful if you've got mocks that aren't called as it's not wasted, but most people will end up executing these type of mocks.

tbryan314 commented 2 years ago

Yes, I did it as a callback. I mean, not quite like the StackOverflow example. It's actually possible for me to use one callback for a bunch of different API paths since the callback can pick the right file based on the information in the request. For example

def resp_body_from_file(req, context):
    api_parts = req.path.split("/")
    filename = "not initialized"
    # Determine the file based on the request details
    if len(api_parts) == 1:
        filename = f"{endpoint_parts[0]}.json"
    else:
    # ... more API-specific logic to determine the file
    file_path = Path(TEST_DATA_DIR, filename)
    return file_path.read_text()

I can use that resp_body_from_file callback in my fixture, and multiple tests can use that same fixture. Since I'm testing a library that uses APIs under the hood, that makes more sense than having each test registering the files that whey want to use for the responses. As you said, the only files that are actually read during a particular test are the ones that are actually need by that the calls made by that test.

For anyone using this approach, note that this approach doesn't use the body parameter. Set the text (or one of the other content parameters) to the callback when registering the mock request.

It still seems like the requests-mock docs either need a working example for using the body parameter, or maybe just remove that from the docs for now. As stated (just pass a file-like object to body), it definitely doesn't work.

jamielennox / requests-mock

Mocking large file downloads #152