platonai / exotic-amazon

A complete solution to crawl amazon at scale completely and accurately.
143 stars 46 forks source link

Failed to create web driver pulsar_chrome, caused by "Using unsafe HTTP verb GET to invoke /json/new. This action supports only PUT verb." #14

Closed sskmtm closed 1 year ago

sskmtm commented 1 year ago

请问一下这种错误是怎么造成的

15:39:53.165 [r-worker-2] INFO  a.p.pulsar.common.ProcessLauncher - Launching process:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --headless --disable-gpu --hide-scrollbars --remote-debugging-port=0 --no-default-browser-check --no-first-run --no-startup-window --mute-audio --disable-background-networking --disable-background-timer-throttling --disable-client-side-phishing-detection --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --disable-blink-features=AutomationControlled --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --ignore-certificate-errors --window-size=1920,1080 --pageLoadStrategy=none --throwExceptionOnScriptError=true --user-data-dir=/var/folders/vr/_8xgwfn14959gb617jpn7gv40000gp/T/pulsar-kust/context/browser/br.2jede
15:39:53.487 [r-worker-2] ERROR a.p.p.p.b.driver.WebDriverFactory - Failed to create web driver pulsar_chrome
ai.platon.pulsar.protocol.browser.DriverLaunchException: Failed to create chrome devtools driver
    at ai.platon.pulsar.protocol.browser.driver.cdt.ChromeDevtoolsDriver.<init>(ChromeDevtoolsDriver.kt:110)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverFactory.createChromeDevtoolsDriver(WebDriverFactory.kt:80)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverFactory.create(WebDriverFactory.kt:44)
    at ai.platon.pulsar.protocol.browser.driver.LoadingWebDriverPool.createDriverIfNecessary(LoadingWebDriverPool.kt:226)
    at ai.platon.pulsar.protocol.browser.driver.LoadingWebDriverPool.poll0(LoadingWebDriverPool.kt:204)
    at ai.platon.pulsar.protocol.browser.driver.LoadingWebDriverPool.poll(LoadingWebDriverPool.kt:118)
    at ai.platon.pulsar.protocol.browser.driver.LoadingWebDriverPool.poll(LoadingWebDriverPool.kt:113)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverPoolManager.firstLaunch(WebDriverPoolManager.kt:255)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverPoolManager.access$firstLaunch(WebDriverPoolManager.kt:40)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverPoolManager$run0$2.invokeSuspend(WebDriverPoolManager.kt:211)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverPoolManager$run0$2.invoke(WebDriverPoolManager.kt)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverPoolManager$run0$2.invoke(WebDriverPoolManager.kt)
    at ai.platon.pulsar.common.PreemptChannelSupport.whenNormalDeferred(PreemptChannelSupport.kt:59)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverPoolManager.run0(WebDriverPoolManager.kt:194)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverPoolManager.run(WebDriverPoolManager.kt:105)
    at ai.platon.pulsar.protocol.browser.driver.WebDriverPoolManager.run(WebDriverPoolManager.kt:101)
    at ai.platon.pulsar.protocol.browser.emulator.context.WebDriverContext.run(BrowserContexts.kt:60)
    at ai.platon.pulsar.protocol.browser.emulator.context.BrowserPrivacyContext.doRun$suspendImpl(BrowserPrivacyContext.kt:43)
    at ai.platon.pulsar.protocol.browser.emulator.context.BrowserPrivacyContext.doRun(BrowserPrivacyContext.kt)
    at ai.platon.pulsar.crawl.fetch.privacy.PrivacyContext.run$suspendImpl(PrivacyContext.kt:118)
    at ai.platon.pulsar.crawl.fetch.privacy.PrivacyContext.run(PrivacyContext.kt)
    at ai.platon.pulsar.protocol.browser.emulator.context.MultiPrivacyContextManager.run0(MultiPrivacyContextManager.kt:118)
    at ai.platon.pulsar.protocol.browser.emulator.context.MultiPrivacyContextManager.run(MultiPrivacyContextManager.kt:101)
    at ai.platon.pulsar.protocol.browser.emulator.context.MultiPrivacyContextManager.run(MultiPrivacyContextManager.kt:54)
    at ai.platon.pulsar.protocol.browser.emulator.BrowserEmulatedFetcher.fetchTaskDeferred(BrowserEmulatedFetcher.kt:76)
    at ai.platon.pulsar.protocol.browser.emulator.BrowserEmulatedFetcher.fetchContentDeferred(BrowserEmulatedFetcher.kt:69)
    at ai.platon.pulsar.protocol.browser.BrowserEmulatorProtocol.getResponseDeferred(BrowserEmulatorProtocol.kt:49)
    at ai.platon.pulsar.crawl.protocol.http.AbstractHttpProtocol.getProtocolOutputDeferred$suspendImpl(AbstractHttpProtocol.kt:101)
    at ai.platon.pulsar.crawl.protocol.http.AbstractHttpProtocol.getProtocolOutputDeferred(AbstractHttpProtocol.kt)
    at ai.platon.pulsar.crawl.component.FetchComponent.fetchContentDeferred0(FetchComponent.kt:133)
    at ai.platon.pulsar.crawl.component.FetchComponent.fetchContentDeferred(FetchComponent.kt:95)
    at ai.platon.pulsar.crawl.component.LoadComponent.fetchContentDeferred(LoadComponent.kt:442)
    at ai.platon.pulsar.crawl.component.LoadComponent.fetchContentIfNecessaryDeferred(LoadComponent.kt:232)
    at ai.platon.pulsar.crawl.component.LoadComponent.loadDeferred1(LoadComponent.kt:217)
    at ai.platon.pulsar.crawl.component.LoadComponent.loadDeferred0(LoadComponent.kt:211)
    at ai.platon.pulsar.crawl.component.LoadComponent.loadWithRetryDeferred(LoadComponent.kt:107)
    at ai.platon.pulsar.crawl.component.LoadComponent.loadDeferred(LoadComponent.kt:94)
    at ai.platon.pulsar.context.support.AbstractPulsarContext.loadDeferred$suspendImpl(AbstractPulsarContext.kt:326)
    at ai.platon.pulsar.context.support.AbstractPulsarContext.loadDeferred(AbstractPulsarContext.kt)
    at ai.platon.pulsar.session.AbstractPulsarSession.loadAndCacheDeferred(AbstractPulsarSession.kt:207)
    at ai.platon.pulsar.session.AbstractPulsarSession.loadDeferred$suspendImpl(AbstractPulsarSession.kt:197)
    at ai.platon.pulsar.session.AbstractPulsarSession.loadDeferred(AbstractPulsarSession.kt)
    at ai.platon.pulsar.session.AbstractPulsarSession.loadDeferred$suspendImpl(AbstractPulsarSession.kt:190)
    at ai.platon.pulsar.session.AbstractPulsarSession.loadDeferred(AbstractPulsarSession.kt)
    at ai.platon.pulsar.crawl.StreamingCrawler.loadWithEventHandlers(StreamingCrawler.kt:520)
    at ai.platon.pulsar.crawl.StreamingCrawler.loadUrl(StreamingCrawler.kt:416)
    at ai.platon.pulsar.crawl.StreamingCrawler.runUrlTask(StreamingCrawler.kt:405)
    at ai.platon.pulsar.crawl.StreamingCrawler.access$runUrlTask(StreamingCrawler.kt:68)
    at ai.platon.pulsar.crawl.StreamingCrawler$runWithStatusCheck$2.invokeSuspend(StreamingCrawler.kt:379)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
Caused by: ai.platon.pulsar.browser.driver.chrome.util.WebSocketServiceException: Received error (405) - Method Not Allowed
Using unsafe HTTP verb GET to invoke /json/new. This action supports only PUT verb.
    at ai.platon.pulsar.browser.driver.chrome.impl.Chrome.request(Chrome.kt:157)
    at ai.platon.pulsar.browser.driver.chrome.impl.Chrome.createTab(Chrome.kt:66)
    at ai.platon.pulsar.protocol.browser.driver.cdt.ChromeDevtoolsBrowserInstance.createTab(ChromeDevtoolsBrowserInstance.kt:45)
    at ai.platon.pulsar.protocol.browser.driver.cdt.ChromeDevtoolsDriver.<init>(ChromeDevtoolsDriver.kt:97)
    ... 54 common frames omitted
15:39:53.489 [r-worker-2] WARN  a.p.pulsar.crawl.StreamingCrawler - Failed to create web driver | pulsar_chrome
platonai commented 1 year ago

偶尔出现还是持续出现?

  1. 如果是偶尔出现先不管它。
  2. 如果是持续出现,关机重启会不会再次出现?如果不会,可以不理会。
sskmtm commented 1 year ago

目前在本地持续出现,而且关机重启之后还是会出现

通过mac 的活动监视器发现,出现这种情况的时候,会产生一个 'chrome_crashpad_handler' 的进程,不确定是否和这个又关系

platonai commented 1 year ago

那为啥以前没出现呢?把 $TMPDIR/pulsasr-$USER 临时文件删掉呢?

sskmtm commented 1 year ago

删掉: $TMPDIR/pulsasr-$USER 目录后,还是会出现

我在本地执行过这样一个命令,之后就会出现上面的问题

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --proxy-server=1.84.252.243:4231 --headless --disable-gpu --hide-scrollbars --remote-debugging-port=0 --no-default-browser-check --no-first-run --no-startup-window --mute-audio --disable-background-networking --disable-background-timer-throttling --disable-client-side-phishing-detection --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --disable-blink-features=AutomationControlled --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --ignore-certificate-errors --window-size=1920,1080 --pageLoadStrategy=none --throwExceptionOnScriptError=true --user-data-dir=/tmp/pulsar-root/context/browser/br.66b305
platonai commented 1 year ago

用 root 用户在很多时候都会出现权限问题,不是权限越高错误越少。

检查是否chrome自身权限限制问题,最简单的办法,执行 ./bin/tools/chrome/prototype/start-copy.sh 看是否能正常打开浏览器。

sskmtm commented 1 year ago

直接执行: ./bin/tools/chrome/prototype/start-copy.sh

会出现如下错误:

Copy data from /Users/kust/.pulsar/browser/chrome/prototype/google-chrome to /tmp/pulsar-kust/context/browser1678358184
cp: /tmp/pulsar-kust/context/browser1678358184: No such file or directory
cp: /Users/kust/.pulsar/browser/chrome/prototype/google-chrome: unable to copy extended attributes to /tmp/pulsar-kust/context/browser1678358184: No such file or directory
rm: /tmp/pulsar-kust/context/browser1678358184/SingletonCookie: No such file or directory
rm: /tmp/pulsar-kust/context/browser1678358184/SingletonLock: No such file or directory
unlink: /tmp/pulsar-kust/context/browser1678358184/SingletonSocket: No such file or directory
./bin/tools/chrome/prototype/start-copy.sh: line 16: cd: /tmp/pulsar-kust/context/browser1678358184: No such file or directory
./bin/tools/chrome/prototype/start-copy.sh: line 18: /usr/bin/google-chrome-stable: No such file or directory

当把 ./bin/tools/chrome/prototype/start-copy.sh 文件中的: /usr/bin/google-chrome-stable -> /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome 后, 其中:/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome 是本机的 chrome 安装路径

再次执行 ./bin/tools/chrome/prototype/start-copy.sh 是可以在浏览器中打开 'https://www.tmall.com/'

Copy data from /Users/kust/.pulsar/browser/chrome/prototype/google-chrome to /tmp/pulsar-kust/context/browser1678359000
cp: /tmp/pulsar-kust/context/browser1678359000: No such file or directory
cp: /Users/kust/.pulsar/browser/chrome/prototype/google-chrome: unable to copy extended attributes to /tmp/pulsar-kust/context/browser1678359000: No such file or directory
rm: /tmp/pulsar-kust/context/browser1678359000/SingletonCookie: No such file or directory
rm: /tmp/pulsar-kust/context/browser1678359000/SingletonLock: No such file or directory
unlink: /tmp/pulsar-kust/context/browser1678359000/SingletonSocket: No such file or directory
./bin/tools/chrome/prototype/start-copy.sh: line 16: cd: /tmp/pulsar-kust/context/browser1678359000: No such file or directory
正在现有的浏览器会话中打开。
➜  exotic-amazon git:(main) ✗ objc[56559]: Class WebSwapCGLLayer is implemented in both /System/Library/Frameworks/WebKit.framework/Versions/A/Frameworks/WebCore.framework/Versions/A/Frameworks/libANGLE-shared.dylib (0x7ffb566b1ec8) and /Applications/Google Chrome.app/Contents/Frameworks/Google Chrome Framework.framework/Versions/111.0.5563.64/Libraries/libGLESv2.dylib (0x111d41850). One of the two will be used. Which one is undefined.
...

这种情况是不是说明权限没问题呢

platonai commented 1 year ago

What's the version number of chrome?

sskmtm commented 1 year ago

➜ ~ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --version Google Chrome 111.0.5563.64 ➜ ~

platonai commented 1 year ago

We didn't test against chrome 111, just up to 110.

Since the error message has said:

Caused by: ai.platon.pulsar.browser.driver.chrome.util.WebSocketServiceException: Received error (405) - Method Not Allowed
Using unsafe HTTP verb GET to invoke /json/new. This action supports only PUT verb.

I think there is a protocol change in chrome 111 and we have to fix it in pulsarR.

The relevant code is in ChromeImpl:

   val uri = URL(String.format(path, *params))
   connection = uri.openConnection() as HttpURLConnection

We have to modify the code to support HTTP PUT method.

platonai commented 1 year ago

A simple fix is to downgrade chrome to 110.

sskmtm commented 1 year ago

ok,降低版本到 110 后,运行没问题了,需要注意的一点需要防止浏览器自动更新

platonai commented 1 year ago

Will be fixed in the next version with pulsar-1.10.11.

platonai commented 1 year ago

Fixed in the main branch by upgrading pulsar to 1.10.11.