hbsnow-sandbox / puppeteer-scraping

puppeteer で特定ドメイン内のスクレイピング
0 stars 0 forks source link

PDFをクロールするとエラーが発生する #3

Open hbsnow opened 4 years ago

hbsnow commented 4 years ago

a(href="*.pdf") がクロール対象になってエラーが発生する

(node:15322) UnhandledPromiseRejectionWarning: Error: net::ERR_ABORTED at https://www.nichicon.co.jp/new/pdfs/disaster_repair_products_support.pdf
    at navigate (/Users/takahashi.y/project/other/puppeteer-scraping/node_modules/puppeteer/lib/FrameManager.js:120:37)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
  -- ASYNC --
    at Frame.<anonymous> (/Users/takahashi.y/project/other/puppeteer-scraping/node_modules/puppeteer/lib/helper.js:111:15)
    at Page.goto (/Users/takahashi.y/project/other/puppeteer-scraping/node_modules/puppeteer/lib/Page.js:674:49)
    at Page.<anonymous> (/Users/takahashi.y/project/other/puppeteer-scraping/node_modules/puppeteer/lib/helper.js:112:23)
    at Spider.searchHrefFromPage (/Users/takahashi.y/project/other/puppeteer-scraping/lib/Spider.ts:92:16)
    at Spider.nextScraping (/Users/takahashi.y/project/other/puppeteer-scraping/lib/Spider.ts:69:30)
    at Spider.start (/Users/takahashi.y/project/other/puppeteer-scraping/lib/Spider.ts:50:20)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
(node:15322) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:15322) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
hbsnow commented 4 years ago

同じ問題として、/^javascript:/ や、確認済みのものとしては zip wmv あたりも落ちる ページ内リンクは無限ループする