simonw / shot-scraper

A command-line utility for taking automated screenshots of websites
https://shot-scraper.datasette.io
Apache License 2.0
1.57k stars 70 forks source link

Experimental feature: heap snapshots #126

Open simonw opened 8 months ago

simonw commented 8 months ago

Here's the prototype:


@cli.command()
@click.argument("url")
@click.option(
    "-a",
    "--auth",
    type=click.File("r"),
    help="Path to JSON authentication context file",
)
def snapshot(url, auth):
    "Return a heap snapshot of the specified URL"
    url = url_or_file_path(url, _check_and_absolutize)
    with sync_playwright() as p:
        context, browser_obj = _browser_context(
            p,
            auth,
            # browser=browser,
            # user_agent=user_agent,
            # reduced_motion=reduced_motion,
        )
        page = context.new_page()
        client = page.context.new_cdp_session(page)

        chunks = []

        def store_chunk(chunk):
            chunks.append(chunk["chunk"])

        client.on("HeapProfiler.addHeapSnapshotChunk", store_chunk)

        page.goto(url)

        client.send("HeapProfiler.takeHeapSnapshot", {})

        combined = "".join(chunks)
        data = json.loads(combined)
        click.echo(json.dumps(data, indent=2))
byt3bl33d3r commented 1 month ago

@simonw I've ported the original project to Python & Playwright https://github.com/byt3bl33d3r/playwright-heap-snapshot

Feel free to steal