psf / requests-html

Pythonic HTML Parsing for Humans™
http://html.python-requests.org
MIT License
13.64k stars 977 forks source link

Recording html-requests with VCR (for testing purposes) #580

Closed kodzonko closed 1 month ago

kodzonko commented 1 month ago

Hi, I'm trying to record the requests using pytest-vcr / pytest-recording, but

  1. recorded request seems incomplete (no path or query)
  2. when pytest tries to use the recorded cassette the request hangs indefinitely or fails to overwrite the cassette (depending which match_on parameters I use)

the casette (trimmed the response as it's very long):

cassette ```yaml interactions: - request: body: null headers: Accept: - '*/*' Accept-Encoding: - gzip, deflate Connection: - keep-alive User-Agent: - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8 method: GET uri: https://www.cinema-city.pl/ response: body: string: !!binary | H4sIAAAAAAAAA+y9a3MbOZYo+L1/BayOnXLdFii+H2pLtbIkl1W2Hi3J5S53dTiQCZAEmQlk50N0 cmYiehztWz9gYiPGt2Lvx424n+vT9tSntbQ/pH/JxgHygSSTFGXLLteOHDNdFAkcAAcHB+eNB/eo ... ... ... headers: Age: - '232' CF-Cache-Status: - HIT CF-RAY: - 886f8bc6aa8534ec-WAW Cache-Control: - public, max-age=300 Connection: - keep-alive Content-Encoding: - gzip Content-Type: - text/html;charset=UTF-8 Date: - Mon, 20 May 2024 21:51:04 GMT Last-Modified: - Mon, 20 May 2024 21:47:12 GMT Server: - cloudflare Set-Cookie: - __cf_bm=7T66fb9eQIG7yoTnQFW_nNy9vKSyv8ZZvUOZ7joqka4-1716241864-1.0.1.1-CsXR1eX.ZfGRXvnRaH9fu1vRe_E9j9rY126apPaIn4WCIVZfDJjSYKt.v2BWb4Kf63O4ZqivMDR3NBzbRM4mpw; path=/; expires=Mon, 20-May-24 22:21:04 GMT; domain=.cinema-city.pl; HttpOnly; Secure; SameSite=None Transfer-Encoding: - chunked content-language: - pl-PL vary: - Accept-Encoding x-b3-spanid: - cd5ee136aa03a1d8 x-b3-traceid: - 11eae3b3aabb43 x-cache: - MISS x-frame-options: - SAMEORIGIN status: code: 200 message: OK - request: body: null headers: Connection: - close Host: - 127.0.0.1:52534 User-Agent: - Python-urllib/3.12 method: GET uri: http://127.0.0.1:52534/json/version response: body: string: "{\r\n \"Browser\": \"HeadlessChrome/124.0.6313.0\",\r\n \"Protocol-Version\": \"1.3\",\r\n \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/124.0.6313.0 Safari/537.36\",\r\n \ \"V8-Version\": \"12.3.219\",\r\n \"WebKit-Version\": \"537.36 (@0000000000000000000000000000000000000000)\",\r\n \ \"webSocketDebuggerUrl\": \"ws://127.0.0.1:52534/devtools/browser/e00df6db-d8b2-47b6-9db5-303d163252fb\"\r\n}\r\n" headers: Content-Length: - '438' Content-Security-Policy: - frame-ancestors 'none' Content-Type: - application/json; charset=UTF-8 status: code: 200 message: OK version: 1 ```

the request is done like so:

        session = HTMLSession()
        url = "https://www.cinema-city.pl/#/buy-tickets-by-cinema?in-cinema=1080&at=2024-05-20"
        response: HTMLResponse = session.get(url)
        response.html.render()  # render JS elements
        session.close()  # otherwise Chromium process will leak

code runs perfectly fine it's just that I cannot get the recording part with VCR right. Any tips

kodzonko commented 1 month ago

VCR cassettes work fine with the fork: https://github.com/cboin1996/requests-html