microsoft / playwright-java

Java version of the Playwright testing and automation library
https://playwright.dev/java/
Apache License 2.0
1.13k stars 205 forks source link

[Bug]: com.microsoft.playwright.TimeoutError: #1591

Closed yidasanqian closed 4 months ago

yidasanqian commented 4 months ago

Version

1.43.0

Steps to reproduce

        String url = "https://blog.csdn.net/wshzd/article/details/136005570";    
        try (Playwright playwright = Playwright.create();
             Browser browser = playwright.chromium().launch(new BrowserType.LaunchOptions().setHeadless(true));
        ) {
            BrowserContext context = browser.newContext(
                    new Browser.NewContextOptions()
                            .setIgnoreHTTPSErrors(true)
                            .setUserAgent(userAgent)                           
            );
            Page page = context.newPage();      
            try {
                Response response = page.navigate(url, new Page.NavigateOptions().setReferer(url).setWaitUntil(WaitUntilState.LOAD).setTimeout(60000));
                if (response.ok()) {                 
                    String htmlContent = response.frame().content();
                    Document doc = Jsoup.parse(htmlContent);

                    doc.select("div.top, div#top, footer, div#footer, [id*=footer], div.footer, [class*=footer], div.right, div#right").remove();
                    String clean = Jsoup.clean(doc.body().html(), safelist);
                    System.out.println("clean = " + clean);            
                }
            } catch (PlaywrightException error) {
                System.out.println("error: " + error.getMessage());
            } finally {
                context.close();
            }
        }

Expected behavior

I hope to see the webpage content load normally.

Actual behavior

com.microsoft.playwright.TimeoutError: Error {
  message='Timeout 60000ms exceeded.
  name='TimeoutError
  stack='TimeoutError: Timeout 60000ms exceeded.
    at ProgressController.run (/tmp/playwright-java-5619993510942649732/package/lib/server/progress.js:78:26)
    at Frame.goto (/tmp/playwright-java-5619993510942649732/package/lib/server/frames.js:517:23)
    at FrameDispatcher.goto (/tmp/playwright-java-5619993510942649732/package/lib/server/dispatchers/frameDispatcher.js:81:119)
    at FrameDispatcher._handleCommand (/tmp/playwright-java-5619993510942649732/package/lib/server/dispatchers/dispatcher.js:94:40)
    at DispatcherConnection.dispatch (/tmp/playwright-java-5619993510942649732/package/lib/server/dispatchers/dispatcher.js:361:39)
}
Call log:
- navigating to "https://blog.csdn.net/wshzd/article/details/135869061", waiting until "load"

        at com.microsoft.playwright.impl.WaitableResult.get(WaitableResult.java:52)
        at com.microsoft.playwright.impl.ChannelOwner.runUntil(ChannelOwner.java:120)
        at com.microsoft.playwright.impl.Connection.sendMessage(Connection.java:130)
        at com.microsoft.playwright.impl.ChannelOwner.sendMessage(ChannelOwner.java:106)
        at com.microsoft.playwright.impl.FrameImpl.navigateImpl(FrameImpl.java:463)
        at com.microsoft.playwright.impl.PageImpl.lambda$navigate$46(PageImpl.java:870)
        at com.microsoft.playwright.impl.LoggingSupport.withLogging(LoggingSupport.java:47)
        at com.microsoft.playwright.impl.ChannelOwner.withLogging(ChannelOwner.java:89)
        at com.microsoft.playwright.impl.PageImpl.navigate(PageImpl.java:870)
        at com.microsoft.playwright.impl.PageImpl.navigate(PageImpl.java:42)
        at com.toowe.enterprise.core.app.searchapp.service.impl.WebSearchServiceImpl.lambda$spiderWebContent$9(WebSearchServiceImpl.java:808)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: com.microsoft.playwright.TimeoutError: Error {
  message='Timeout 60000ms exceeded.
  name='TimeoutError
  stack='TimeoutError: Timeout 60000ms exceeded.
    at ProgressController.run (/tmp/playwright-java-5619993510942649732/package/lib/server/progress.js:78:26)
    at Frame.goto (/tmp/playwright-java-5619993510942649732/package/lib/server/frames.js:517:23)
    at FrameDispatcher.goto (/tmp/playwright-java-5619993510942649732/package/lib/server/dispatchers/frameDispatcher.js:81:119)
    at FrameDispatcher._handleCommand (/tmp/playwright-java-5619993510942649732/package/lib/server/dispatchers/dispatcher.js:94:40)
    at DispatcherConnection.dispatch (/tmp/playwright-java-5619993510942649732/package/lib/server/dispatchers/dispatcher.js:361:39)
}
Call log:
- navigating to "https://blog.csdn.net/wshzd/article/details/135869061", waiting until "load"

        at com.microsoft.playwright.impl.Connection.dispatch(Connection.java:254)
        at com.microsoft.playwright.impl.Connection.processOneMessage(Connection.java:211)
        at com.microsoft.playwright.impl.ChannelOwner.runUntil(ChannelOwner.java:118)
        ... 13 common frames omitted

Additional context

On my local Windows 11, running the same code directly through an IDE can load web content normally. But when I package it into a docker image and run it in a container environment, timeouts occur. What's even weirder is that some links can load web content normally, while others trigger timeouts that I can't control.

Environment

Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven home: /opt/apache-maven-3.9.6
Java version: 1.8.0_402, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en, platform encoding: UTF-8
OS name: "linux", version: "5.15.0-67-generic", arch: "amd64", family: "unix"

dockerfile part:

FROM mcr.microsoft.com/playwright/java:v1.43.0-jammy
ENV LANG C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive

RUN rm -rf /usr/lib/jvm/*  \
    && apt-get update  \
    && apt-get install -y --no-install-recommends openjdk-8-jdk \
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ENV JAVA_OPTS="-Xms512m -Xmx8g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/heapdumps/azs_backend_heapdump.hprof -Djava.security.egd=file:/dev/./urandom"
yury-s commented 4 months ago

Since it's working fine locally, this likely has something to do with network access from the container. Try recording traces and compare them with local behavior to see which requests don't go through and why.

If you think this is a playwright issue, please file a new issue and provide a self contained example that doesn't depend on external servers (which may well return different results depending on where a request comes from).

yidasanqian commented 4 months ago

Since it's working fine locally, this likely has something to do with network access from the container. Try recording traces and compare them with local behavior to see which requests don't go through and why.

If you think this is a playwright issue, please file a new issue and provide a self contained example that doesn't depend on external servers (which may well return different results depending on where a request comes from).

I set the timeout to 20 seconds locally and still get this error, but the website actually loads very quickly.

I checked the contents of trace.zip, and in reality, the main content loaded in about 3 seconds, but the total duration was around 23 seconds. Why did it take so long? Is there any way to speed up the loading time?

image