Proposal: script to pretty-print YAML cassettes

gward commented 3 years ago

I've been a happy user of VCR.py at a couple of jobs for several years. One little annoyance is that YAML cassettes written by are VCR.py are not as readable as they could be. Here's an example from our unit tests for a small API client library:

  request:
    [...omitted...]
  response:
    body:
      string: '{"created_at":"2021-03-29T14:17:52.494Z","userprincipalname":null,"trusted_idp_id":null,"manager_ad_id":null,"department":null,"email":"joe.slow@example.com","locked_un
til":null,"username":null,"comment":null,"password_changed_at":null,"group_id":null,"invitation_sent_at":null,"state":1,"title":null,"custom_attributes":{"fn_test_field":null,"fnperms
":"","customer_ids":null,"perms":null,"ns_contact_id":null},"company":null,"directory_id":null,"firstname":"Joe","lastname":"Slow","status":7,"role_ids":[],"activated_at":null,"member_of":null,"phone":null,"updated_at":"2021-03-29T14:17:52.494Z","distinguished_name":null,"external_id":null,"invalid_login_attempts":0,"last_login":null,"samaccountname":null,"preferred_locale_code":null,"manager_user_id":null,"id":128762714}'
    headers:
      cache-control:
      - no-cache
      content-length:
      - '776'
      content-type:
      - application/json; charset=utf-8
      date:
      - Mon, 29 Mar 2021 14:17:52 GMT
      status:
      - 201 Created
      [...more response headers...]

Good news: this accurately captures the request/response cycle, just as it's supposed to. VCR.py is working as advertised.

Bad news: the JSON response is a bit hard to read, and harder to modify. Sometimes it's useful to manually tweak a response to test an edge case, or because an API has added a new feature and it's too much trouble to capture new responses. Editing a compact multiline blob of JSON is annoying, and the resulting diff is useless.

This example is far from the worst. More complex/nested data structures are really hard to understand, but VCR cassettes are a great way to informally document APIs. ("Ohh, that's what the response to POST /user looks like!") When dealing with older/nastier APIs, it's common to see JSON wrapped in JSON, or a mix of XML and JSON responses. I've also had to deal with APIs that use multipart form requests (eg. for file uploads), and the resulting cassettes are really hard to read.

And gzip'ed responses are really annoying: there's nothing wrong with an API that returns a compressed response body, but trying to understand that in a VCR.py cassette is impossible.

So I wrote a hacky little script to pretty-print VCR.py cassettes. For example, if I run it on the above cassette, it notices that the response is JSON, and formats it accordingly:

  response:
    body:
      string: '{
      "created_at": "2021-03-29T14:17:52.494Z",
      "userprincipalname": null,
      "trusted_idp_id": null,
      "manager_ad_id": null,
      "department": null,
      "email": "joe.slow@example.com",
      [...more fields...]
    }'
    headers:
      cache-control:
      - no-cache
      content-length:
      - '776'
      content-type:
      - application/json; charset=utf-8

Good news: the JSON is much more readable and edit-friendly. This is still valid YAML that VCR.py happily accepts.

Bad news: the content-length header is a lie. More broadly, this is no longer a byte-precise capture of the response.

Anyways: I want to open-source this script. I think the best place for it is in VCR's own repo, maybe in a contrib/ directory. If that is agreeable to you, I'll open a PR. If you're not interested, please let me know and I'll create a tiny little project just for this script.

lesmo commented 3 years ago

I had this exact need and ended up writing a custom serializer, which would "prettify" the request and response bodies and "uglify" them on deserialization (mutating the request length as necessary).

I think the best approach is to create a new serializer and add it to the vcr/serializers/ dir.

pyoor commented 1 week ago

Looks like @codefromthecrypt managed to make this work in https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2945. @kevin1024 is this something you'd consider including in-tree?

kevin1024 / vcrpy

Proposal: script to pretty-print YAML cassettes #580