webrecorder / browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container
https://crawler.docs.browsertrix.com
GNU Affero General Public License v3.0
613 stars 79 forks source link

Support importing behaviors from the new Chrome dev tools Recorder panel JSON export format #283

Open pirate opened 1 year ago

pirate commented 1 year ago

Chrome recently added in v101 a new framework-agnostic JSON user script export format for their Recording pane.

screnshot of dev tools record pane JSON export option

https://developer.chrome.com/docs/devtools/recorder/reference/#export-flows

It used to only support generating puppeteer JS scripts, but now with the JSON format it can be imported into a wide variety of tools without needing to translate puppeteer JS scripts into playwright/other drivers.

image

Here's a sample of the JSON format they generate:

{
    "title": "Test Recording 4/11/2023 at 5:15:18 PM",
    "steps": [
        {
            "type": "setViewport",
            "width": 1869,
            "height": 264,
            "deviceScaleFactor": 1,
            "isMobile": false,
            "hasTouch": false,
            "isLandscape": false
        },
        {
            "type": "navigate",
            "assertedEvents": [
                {
                    "type": "navigation",
                    "url": "https://jec.fyi/demo/recorder",
                    "title": "Puppeteer recorder - test page"
                }
            ],
            "url": "https://jec.fyi/demo/recorder"
        },
        {
            "type": "click",
            "target": "main",
            "selectors": [
                [
                    "[data-testid='email']"
                ],
                [
                    "xpath///*[@data-testid=\"email\"]"
                ],
                [
                    "pierce/[data-testid='email']"
                ]
            ],
            "offsetX": 58.1776123046875,
            "offsetY": 8.579559326171875
        },
        {
            "type": "change",
            "target": "main",
            "selectors": [
                [
                    "[data-testid='email']"
                ],
                [
                    "xpath///*[@data-testid=\"email\"]"
                ],
                [
                    "pierce/[data-testid='email']"
                ]
            ],
            "value": "test@example.com"
        }
    ]
}

(honestly I recommend opening an issue upstream (done: https://github.com/microsoft/playwright/issues/22345) to add this JSON importing feature to playwright, many people will likely want this)

Replaying docs for playwright where it would be documented if added: https://playwright.dev/docs/next/network#record-and-replay-requests Chrome docs where it would be documented if playwright added support for replaying this JSON: https://developer.chrome.com/docs/devtools/recorder/reference/#replay-with-external-libraries

ikreymer commented 1 year ago

@pirate thanks for sharing this, potentially exciting that there is a standalone JSON format that's not tied to puppeteer. I wonder if there is a spec for it. There's potentially a few different paths:

Very curious about the customization possibility that exists for "Get Extensions". cc: @Shrinks99 you may be interested in this from design point cc: @lambdahands something to explore re: behaviors integration?

pirate commented 1 year ago

I just opened an issue in the main playwright repo: https://github.com/microsoft/playwright/issues/22345

I'd give them a few weeks to respond before doing anything to add it to browsertrix-crawler, I'd be very surprised if it takes more than a few months for someone to contribute this to playwright given how useful it would be.

Explore option of custom extension? I see there's the "Get Extensions" option, which we could implement our own?

Yeah here's the docs on extensions and extending Recorder functionality:

Here's an example Chrome extension that implements a custom conversion that's available built into the Dev Tools panel export menu: https://github.com/kobenguyent/codeceptjs-chrome-recorder/blob/main/src/main.ts

(Landing native Browsertrix Behavior export from Chrome dev tools would be awesome with an extension like this)

ikreymer commented 1 year ago

Should also evaluate the general applicability of this, beyond a single page. In some ways, this is similar to what Memento Tracer was trying to do. The overall behavior system is still probably the more general solution, but this might be a useful subset that can be supported within that.