kevin1024 / vcrpy

Automatically mock your HTTP interactions to simplify and speed up testing
MIT License
2.72k stars 388 forks source link

Handle HTTPX UTF-8 decoding errors #882

Open evantahler opened 1 week ago

evantahler commented 1 week ago

Hello! Thank you for the vrc library - It makes testing our multi-service application /possible/.

In one of our tests, we want to ensure that application A can upload a file to application B and get some data back. We do something like this in our code:

files = {"file": ("my_file.pdf", open("my_file.pdf", "rb"))}

async with httpx.AsyncClient() as client:
  response = await client.post(
      "https://my-upload-service.com/api/post",
      json=request_payload,
      files=files,
  )

Recording this interaction with VCR throws an error because the PDF file in question can't be serialized to UTF8 without error, as it is a binary file

httpx_request = <Request('POST', 'http://parser:changeme123@localhost:8200/parser/api/v1/parse')>, kwargs = {}

    def _make_vcr_request(httpx_request, **kwargs):
>       body = httpx_request.read().decode("utf-8")
E       UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 154: invalid continuation byte

As all the existing vcr filters require the request to be parsed so that we can inspect the body/headers/etc, they won't help us here. The assumption that most requests are UTF-8 serializable makes perfect sense, and this is a bit of a weird edge case. So, I'd like to keep the existing behavior as much as possible, but in the case of a UnicodeDecodeError`, let's try parsing again, and drop any bytes that are causing trouble. In our case, it didn't make a meaningul difference to the cassette recording.

evantahler commented 1 week ago

For the moment, we've gone with a monkeypatching approach:

import warnings

import vcr  # type: ignore[import-untyped]
from vcr.request import Request as VcrRequest  # type: ignore[import-untyped]
from vcr.stubs.httpx_stubs import (  # type: ignore
    _make_vcr_request,  # noqa: F401 this is needed for some reason so python knows this method exists
)

def _fixed__make_vcr_request(  # type: ignore
    httpx_request,
    **kwargs,  # noqa: ARG001
) -> VcrRequest:
    try:
        body = httpx_request.read().decode("utf-8")
    except UnicodeDecodeError as e:  # noqa: F841
        body = httpx_request.read().decode("utf-8", errors="ignore")
        warnings.warn(
            f"Could not decode full request payload as UTF8, recording may have lost bytes. {e}",
            stacklevel=2,
        )
    uri = str(httpx_request.url)
    headers = dict(httpx_request.headers)
    return VcrRequest(httpx_request.method, uri, body, headers)

vcr.stubs.httpx_stubs._make_vcr_request = _fixed__make_vcr_request