harlan-zw / unlighthouse

Scan your entire site with Google Lighthouse in 2 minutes (on average). Open source, fully configurable with minimal setup.
https://unlighthouse.dev
MIT License
3.81k stars 110 forks source link

Unlighthouse never finishes when all urls are failing #164

Open BennyAlex opened 1 year ago

BennyAlex commented 1 year ago

Describe the bug

When providing a list of urls and all of them are failing, for example because the Server blocks the robot, the unlighthouse start function will never finish or throw an error. I also cant see an "on-error" hook or something on the hooks documentation: https://unlighthouse.dev/api

const {hooks} = unlighthouse;

hooks.hook('worker-finished', () => {
    console.log('All sites tested');
});

await unlighthouse.start(); 

Eg the output will be (you see no 'All sites tested')

HTML extract of https://www.bayer.com/de/ response failed.                                                                                                                                                                 Unlighthouse 09:42:13
Skipping /de/. Invalid status code: 403.                                                                                                                                                                                   Unlighthouse 09:42:13  
Ignoring route /de/.                                                                                                                                                                                                       Unlighthouse 09:42:13  
HTML extract of https://www.bayer.com/de/de/deutschland-startseite response failed.                                                                                                                                        Unlighthouse 09:42:14
Skipping /de/de/deutschland-startseite. Invalid status code: 403.                                                                                                                                                          Unlighthouse 09:42:14  
Ignoring route /de/de/deutschland-startseite.                                                                                                                                                                              Unlighthouse 09:42:14  
HTML extract of https://www.bayer.com/de/gesundheit response failed.                                                                                                                                                       Unlighthouse 09:42:14
Skipping /de/gesundheit. Invalid status code: 403.                                                                                                                                                                         Unlighthouse 09:42:14  
Ignoring route /de/gesundheit.                                                                                                                                                                                             Unlighthouse 09:42:14  
HTML extract of https://www.bayer.com/de/kontakt response failed.                                                                                                                                                          Unlighthouse 09:42:15
Skipping /de/kontakt. Invalid status code: 403.                                                                                                                                                                            Unlighthouse 09:42:15  
Ignoring route /de/kontakt.                                                                                                                                                                                                Unlighthouse 09:42:15  
HTML extract of https://www.bayer.com/de/locations response failed.                                                                                                                                                        Unlighthouse 09:42:15
Skipping /de/locations. Invalid status code: 403.                                                                                                                                                                          Unlighthouse 09:42:15  
Ignoring route /de/locations.  

and then it never stops.

BennyAlex commented 1 year ago
i Using puppeteer dependency for chrome.                                                                                                                                                                                     Unlighthouse 10:15:38  
D Post config resolution { routerPrefix: '/',                                                                                                                                                                                Unlighthouse 10:15:38  
  apiPrefix: '/api',
  cache: false,
  client:
   { groupRoutesKey: 'route.definition.name',
     columns:
      { overview: [Array],
        performance: [Array],
        accessibility: [Array],
        'best-practices': [Array],
        seo: [Array] } },
  scanner:
   { customSampling: {},
     ignoreI18nPages: true,
     maxRoutes: 500,
     skipJavascript: false,
     samples: 1,
     throttle: false,
     crawler: false,
     dynamicSampling: 8,
     sitemap: false,
     robotsTxt: false,
     device: 'desktop',
     exclude: [ '/cdn-cgi/*' ] },
  server: { port: 5678, showURL: false, open: false },
  discovery: false,
  root: 'C:\\Users\\benny\\work\\lighthouse',
  outputPath: '.unlighthouse',
  debug: true,
  puppeteerOptions:
   { headless: true,
     ignoreHTTPSErrors: true,
     args: [ '--no-sandbox' ],
     defaultViewport:
      { mobile: false, width: 1350, height: 940, deviceScaleFactor: 1, disabled: false } },
  puppeteerClusterOptions:
   { monitor: true,
     workerCreationDelay: 500,
     retryLimit: 3,
     timeout: 300000,
     maxConcurrency: 16,
     skipDuplicateUrls: false,
     retryDelay: 2000,
     concurrency: 3 },
  lighthouseOptions:
   { onlyCategories: [ 'performance', 'accessibility', 'best-practices', 'seo' ],
     throttlingMethod: 'provided',
     throttling:
      { rttMs: 0,
        throughputKbps: 0,
        cpuSlowdownMultiplier: 1,
        requestLatencyMs: 0,
        downloadThroughputKbps: 0,
        uploadThroughputKbps: 0 },
     formFactor: 'desktop',
     screenEmulation:
      { mobile: false, width: 1350, height: 940, deviceScaleFactor: 1, disabled: false },
     emulatedUserAgent:
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36' },
  site: 'https://www.bayer.com/de/de/deutschland-startseite',
  urls:
   [ 'https://www.bayer.com/de/de/deutschland-startseite',
     'https://www.bayer.com/de/locations',
     'https://www.bayer.com/de/',
     'https://www.bayer.com/de/gesundheit',
     'https://www.bayer.com/de/kontakt' ],
  output: { json: true, html: true },
  chrome:
   { useSystem: false,
     useDownloadFallback: true,
     downloadFallbackVersion: 1095492,
     downloadFallbackCacheDir: 'C:\\Users\\benny\\.unlighthouse' } }
saadsme commented 1 month ago

Facing the same issue. Just hangs indefinitely