mattsse / chromiumoxide

Chrome Devtools Protocol rust API
Apache License 2.0
755 stars 78 forks source link

Critical Deserialization issue #167

Closed Wicpar closed 8 months ago

Wicpar commented 1 year ago

Hi, i have been using this crate for months without issues and suddenly i get these errors:

2023-08-12T23:41:01.307650Z ERROR chromiumoxide::conn: Failed to deserialize WS response data did not match any variant of untagged enum Message
2023-08-12T23:41:01.307668Z ERROR chromiumoxide::handler: WS Connection error: Serde(Error("data did not match any variant of untagged enum Message", line: 0, column: 0))

this makes no sense, i reinstalled the fetched (using the fetcher) chromium, reinstalled the crate. The code hasn't changed from when it worked...

This is highly blocking as my service depends heavily on this crate.

The only explaination i can imagine is that the Message enum depends on remote data and there is no catchall to resolve potential changes.

Wicpar commented 1 year ago

same issues happen when i run it on WSL, so it proves it's not system dependent, i verified there is no code change related to chromium, restarted the computer, so it must be a change in chromium.

Wicpar commented 1 year ago

The last Message to be parsed before the issues is the following:

{"method":"Page.lifecycleEvent","params":{"frameId":"B0FCF18A982213C9947D313EAA8F934A","loaderId":"547DA6CC3D4A41314EA08A88BFA62B21","name":"commit","timestamp":1531.878478},"sessionId":"E0BCD37484373226136272710B8CB432"}

Edit: Confirmed to originate from https://github.com/mattsse/chromiumoxide/blob/939a12664ac6fc85fde92112f80bcafdb3e26421/src/conn.rs#L145

Wicpar commented 1 year ago

CdpEventMessage seems to be the curlpit, but it does have a catchall, it should match

Wicpar commented 1 year ago

my entire code is the following, you should be able to try it out with minimal modification:

#[derive(Clone)]
pub struct PDFService {
    browser: Arc<Browser>,
    semaphore: Arc<Semaphore>,
}

impl PDFService {
    pub async fn new(templates: Option<&TemplateService>) -> anyhow::Result<Self> {
        let download_path = Path::new("./download");
        tokio::fs::create_dir_all(&download_path).await?;
        let fetcher = BrowserFetcher::new(
            BrowserFetcherOptions::builder()
                .with_path(&download_path)
                .build()?,
        );
        info!("Loading Chromium...");
        let info = fetcher.fetch().await?;
        let browser = Arc::new(Self::browser(&info.executable_path).await?);
        info!("Chromium Loaded: {}", info);
        let this = Self {
            browser,
            semaphore: Arc::new(Semaphore::new(
                available_parallelism()
                    .unwrap_or(NonZeroUsize::new(1).unwrap())
                    .get(),
            )),
        };
        if let Some(templates) = templates {
            // move this part to a better place once more templates are added
            info!("Verifying Builtin templates...");
            CertificateTemplate::test(&this, templates).await?;
        }
        Ok(this)
    }

    async fn browser(path: impl AsRef<Path>) -> anyhow::Result<Browser> {
        let (browser, mut handler) = Browser::launch(
            BrowserConfig::builder()
                .chrome_executable(path)
                .request_timeout(Duration::from_secs(1))
                .no_sandbox()
                .build()
                .map_err(|it| anyhow!("Failed to create Chromium context: {}", it))?,
        )
        .await?;
        // spawn a new task that continuously polls the handler, it is required to drive the events, it will end and deallocate when the handler closes
        let _handle = tokio::task::spawn(async move {
            while let Some(h) = handler.next().await {
                if let Err(err) = h {
                    error!("Error in chromium handler: {}", err);
                    break;
                }
            }
        });
        Ok(browser)
    }

    pub(crate) async fn render_template(&self, html: String) -> anyhow::Result<Bytes> {
        let _semaphore = self.semaphore.acquire().await?; // do at most n simultaneously
        let html = html.replace(
            "<head>",
            &format!(
                "<head><script>{}</script>",
                include_str!("pagedjs.polyfill.min.js")
            ),
        );
        let page = self.browser.new_page("about:blank").await?; // <-- error here
        let res = page
            .set_content(html)
            .await?
            .pdf(
                chromiumoxide::cdp::browser_protocol::page::PrintToPdfParams {
                    prefer_css_page_size: Some(true),
                    print_background: Some(true),
                    ..Default::default()
                },
            )
            .await?;
        page.close().await?;
        Ok(res.into())
    }
}
Wicpar commented 1 year ago

It's the craziest thing, looks like a serde issue, at least i have a temp fix.

Wicpar commented 1 year ago

it seems to be in an issue with how serde visits untagged enums. a quick and dirty implemnetation of the deserializer fixes it too:

impl<'de, T>  Deserialize<'de> for Message<T> where T: for<'t> Deserialize<'t> {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> where D: Deserializer<'de> {
        let value = Value::deserialize(deserializer)?;
        if let Ok(response) = serde_json::from_value(value.clone()) {
            return Ok(Self::Response(response))
        }
        use serde::de::{Error};
        Ok(Self::Event(serde_json::from_value(value.clone()).map_err(|err|Error::custom(format!("{:?}", err)))?))
    }
}

i'm open to doing a PR, i think the best utility-wise would be to implement deserialize manually, maybe there is a better non-bugged way to implement it. Maybe CdpEventMessage can be changed to play nicer with untagged enums

Sytten commented 9 months ago

Did we end up fixing this issue?

Wicpar commented 9 months ago

i don't think so, i still use my fork with the fix.

MOZGIII commented 9 months ago

I have had a similar issue, but in my case the issue comes from an invalid message:

{
  "requestId": "...",
  "blockedCookies": [
    {
      "blockedReasons": ["SameSiteUnspecifiedTreatedAsLax"],
      "cookieLine": "...=5; expires=Mon, 18 Dec 2023 01:25:44 GMT; Max-Age=3600; Path=/",
      "cookie": {
        "name": "...",
        "value": "5",
        "domain": "...",
        "path": "/",
        "expires": 1702862744.118586,
        "size": 21,
        "httpOnly": false,
        "secure": false,
        "session": false,
        "priority": "Medium",
        "sameParty": false,
        "sourceScheme": "Secure",
        "sourcePort": 443
      }
    },
    {
      "blockedReasons": ["SameSiteUnspecifiedTreatedAsLax"],
      "cookieLine": "...=\"\"; expires=Thu, 01 Jan 1970 00:00:00 GMT; Max-Age=0; Path=/",
      "cookie": {
        "name": "...",
        "value": "\"\"",
        "domain": "...",
        "path": "/",
        "expires": null,
        "size": 20,
        "httpOnly": false,
        "secure": false,
        "session": false,
        "priority": "Medium",
        "sameParty": false,
        "sourceScheme": "Secure",
        "sourcePort": 443
      }
    },
    {
      "blockedReasons": ["SameSiteUnspecifiedTreatedAsLax"],
      "cookieLine": "...=\"\"; expires=Thu, 01 Jan 1970 00:00:00 GMT; Max-Age=0; Path=/",
      "cookie": {
        "name": "...",
        "value": "\"\"",
        "domain": "...",
        "path": "/",
        "expires": null,
        "size": 11,
        "httpOnly": false,
        "secure": false,
        "session": false,
        "priority": "Medium",
        "sameParty": false,
        "sourceScheme": "Secure",
        "sourcePort": 443
      }
    },
    {
      "blockedReasons": ["SameSiteUnspecifiedTreatedAsLax"],
      "cookieLine": "...=\"\"; expires=Thu, 01 Jan 1970 00:00:00 GMT; Max-Age=0; Path=/",
      "cookie": {
        "name": "...",
        "value": "\"\"",
        "domain": "...",
        "path": "/",
        "expires": null,
        "size": 12,
        "httpOnly": false,
        "secure": false,
        "session": false,
        "priority": "Medium",
        "sameParty": false,
        "sourceScheme": "Secure",
        "sourcePort": 443
      }
    }
  ],
  "headers": {
    "server": "cloudflare",
    "more": "redacted"
  },
  "resourceIPAddressSpace": "Public",
  "statusCode": 200,
  "cookiePartitionKey": "https://...",
  "cookiePartitionKeyOpaque": false
}

The error was at "invalid type: null, expected f64", line: 31, column: 33. It is, in fact, looking like the browser sends bad data - either that or the parsed/codegen is wrong, or the PDL is wrong.

Anyhow, it looks like in any case, a failure to parse a message can be shoved under the rug in 95% of cases - so I propose that we make the parsing errors non-critical as opposed to crashing the whole handler. In my case, I do not care about that particular message type that crashes and takes everything else (that I actually do care about) with it.

Wicpar commented 8 months ago

Closing this as it seems fixed in latest version

MOZGIII commented 8 months ago

It's not fixed - it's a design issue. If you don't encounter it it is because you don't receive messages that the CDP schema you have can't handle. The issue is unknown responses message the whole protocol, instead of being ignored.

So the issue is likely hidden in the latest version, rather being fundamentally fixed.

Wicpar commented 8 months ago

i switched to short lived browser instances it might mitigate the issue

ryo33 commented 7 months ago

I guess the source of the original problem came from #[serde(untagged)]. A derived deserializer of an untagged enum does not work correctly with serde_json and f64 etc somewhere in its type definition. The CdpEvent type seems to contain these types which is problematic on using an untagged enum and serde_json. This seems the reason why https://github.com/mattsse/chromiumoxide/pull/194 works as a fix.

MOZGIII commented 7 months ago

Interesting, is there a test you could reproduce it with? It seems regression prevention would be very necessary here. If it is indeed a bug within serde and not the missing types - it would clearly be a better fix than skipping an error.

However this does not mean the loop should be crashing when it encounters a parsing error...

Wicpar commented 7 months ago

Big structures that can be partially parsed should be partially parsed. if unknown data is deserialized it sould be deserialized as unknown.

ryo33 commented 7 months ago

However this does not mean the loop should be crashing when it encounters a parsing error...

Big structures that can be partially parsed should be partially parsed. if unknown data is deserialized it sould be deserialized as unknown.

I agree with both. I think adding CdpEvent::Unknown(serde_json::Value) as a variant is the first choice. I believe there is no chance of failing to parse a JSON intended to be a response (by Chromium) because the Response type seems durable to unexpected compatibility issues. (I said the same thing on https://github.com/mattsse/chromiumoxide/pull/206#issuecomment-1945274529) So we can ignore the case, an invalid response would be sent from Chromium. So, I believe that adding the variant that catches invalid events solves almost all cases we met and will meet.

Also, I think crashing the handler is ok in the case the Chromium sends an invalid response intended as a response, not an event. Because it will break the semantics of the request/response pattern if skipped. (Conversely, I think it's acceptable to skip processing invalid events if Chromium intended it as an event, not a response, and it's documented on event handler APIs in chromiumoxide.)

Wicpar commented 7 months ago

Yes, if the whole struct is unusable it should crash, what i meant is if the return value contains a single invalid element in an array it should not invalidate the rest even in responses. One can then choose to retry if the value is likely wrong.

ryo33 commented 7 months ago

IMO one of the greatest benefits of using chromiumoxide is that the types reflect the exact protocol definition. If we are going to allow the nice partial parsing as you meant (if I understand it correctly), it introduces some complexity to the type generation and user experience. But, indeed, chromium occasionally sends invalid responses or events to the protocol, so we need some way to deal with the situation. If at most the handler does not crash immediately and we can know the wrong message, we can choose how to resolve that, like ignoring, filing an issue to crbug, or handling the serde_json::Value in custom code for the situation.

I think the problem is that we cannot prevent the handler crashes, without staying to the chromium version that worked before, like MOZGIII said. As for events, I suggest the solution on #207, and as for responses it already handles the parsing error without crashing (but we still cannot get the serde_json::Value of the wrong response, and I'd like to fix this in another pull request).