mattsse / chromiumoxide

Chrome Devtools Protocol rust API
Apache License 2.0
746 stars 77 forks source link

Is there an example of Network Request Interception? #157

Open Cyberphinx opened 1 year ago

Cyberphinx commented 1 year ago

Could you provide a brief example of how to do Network Request Interception in order to extract the url of a specific xhr request?

escritorio-gustavo commented 1 year ago

There is one in the examples directory in this repo, called interception.rs, though I'm still struggling to understand how it works

examples/interception.rs

escritorio-gustavo commented 1 year ago

I managed to get it working and also discovered you don't actually need to have Page in an Arc, which is usefull if you need to call a method that consumes Page, such as Page::close.

Below is the example I linked to with a couple of extra comments

use std::sync::Arc;

use base64::prelude::BASE64_STANDARD;
use base64::Engine;
use chromiumoxide::cdp::browser_protocol::fetch::{
    ContinueRequestParams, EventRequestPaused, FulfillRequestParams,
};

// Required to call `next` on `handler` and `request_paused`
use futures::StreamExt;

use chromiumoxide::browser::{Browser, BrowserConfig};

const CONTENT: &str = "<html><head></head><body><h1>TEST</h1></body></html>";
const TARGET: &str = "https://news.ycombinator.com/";

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // The `tracing_subscriber` crate is required. Without it, the requests won't actually
    // be paused. Make sure to only invoke this once in your program or it will panic,
    // having it on `main` helps avoiding that
    tracing_subscriber::fmt::init();

    // Spawn browser
    let (mut browser, mut handler) = Browser::launch(
        BrowserConfig::builder()
            .enable_request_intercept()
            .disable_cache()
            .build()?,
    )
    .await?;

    let browser_handle = async_std::task::spawn(async move {
        while let Some(h) = handler.next().await {
            if h.is_err() {
                break;
            }
        }
    });

    // Setup request interception
    // `Arc` is optional, but I didnt want to remove it because I'm not compiling the code
    let page = Arc::new(browser.new_page("about:blank").await?);

    let mut request_paused = page.event_listener::<EventRequestPaused>().await.unwrap();
    let intercept_page = page.clone();
    let intercept_handle = async_std::task::spawn(async move {
        while let Some(event) = request_paused.next().await {
            // You MUST do `intercept_page.execute` with either
            // `FulfillRequestParams`, `ContinueRequestParams` or `FailRequestParams`
            // for ALL requests. Any request that isn't treated will be permanently stuck
            // here, which will crash your program

            if event.request.url == TARGET {
                if let Err(e) = intercept_page
                    .execute(
                        FulfillRequestParams::builder()
                            .request_id(event.request_id.clone())
                            .body(BASE64_STANDARD.encode(CONTENT))
                            .response_code(200)
                            .build()
                            .unwrap(), // Will panic if `request_id`, `body` or `response_code` are missing. Other fields are optional
                    )
                    .await
                {
                    println!("Failed to fullfill request: {e}");
                }
            } else if let Err(e) = intercept_page
                .execute(ContinueRequestParams::new(event.request_id.clone()))
                .await
            {
                println!("Failed to continue request: {e}");
            }
        }
    });

    // Navigate to target
    page.goto(TARGET).await?;
    page.wait_for_navigation().await?;
    let content = page.content().await?;
    if content == CONTENT {
        println!("Content overriden!")
    }

    // Navigate to other
    page.goto("https://google.com").await?;
    page.wait_for_navigation().await?;
    let content = page.content().await?;
    if content != CONTENT {
        println!("Content not overriden!")
    }

    browser.close().await?;
    browser_handle.await;
    intercept_handle.await;
    Ok(())
}