harlan-zw / unlighthouse

Scan your entire site with Google Lighthouse in 2 minutes (on average). Open source, fully configurable with minimal setup.
https://unlighthouse.dev
MIT License
3.84k stars 112 forks source link

Smaller return than expected #236

Closed mgifford closed 1 month ago

mgifford commented 2 months ago

Describe the bug

I have this running:

% npx unlighthouse-ci --site https://organdonor.gov/ --throttle --yes --reporter csvExpanded --expose-gc --timeout 600000 --protocoll-timeout 300000 --navigation-timeout 60000 --log-level error

My config (unlighthouse.config.ts ) is:

export default {
  scanner: {
    include: [
      "/",
      "/about",
      "/foia",
      "/inspector",
      "/privacy",
      "/search",
      "/sitemap",
      "/accessibility",
      "/contact",
      "/fear",
      "/espanol",
      "/es",
      "/sitemap",
      "/sitemap.xml",
      "/*"
    ],
    // run lighthouse for each URL 1 time(s)
    samples: 1,
    // use desktop to scan
    device: 'desktop',
    // enable the throttling mode
    throttle: true,
    // increase the maximum number of routes - https://unlighthouse.dev/api/config#scannermaxroutes
    maxRoutes: 300,
    // skip the javascript scan
    skipJavascript: false,
    // use sitemaps - arrays are possible for specific sites https://unlighthouse.dev/api/config#scannersitemap
    sitemap: true,
  },
  chrome: {
    // false will force the fallback to be used
    useSystem: true
  },
  debug: false,
}

on some sites it works as expected, but on others I only get 10-30 results.

I haven't checked on all of them, but there are a lot of other URLs in the sitemap that aren't being scanned: https://www.organdonor.gov/sitemap.xml

Their robots.txt looks pretty standard too: https://www.organdonor.gov/robots.txt

Reproduction

No response

System / Nuxt Info

System:
    OS: macOS 14.5
    CPU: (10) arm64 Apple M1 Max
    Memory: 169.72 MB / 32.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 20.16.0 - ~/.nvm/versions/node/v20.16.0/bin/node
    Yarn: 1.22.22 - /opt/homebrew/bin/yarn
    npm: 10.8.1 - ~/.nvm/versions/node/v20.16.0/bin/npm
    pnpm: 9.5.0 - /opt/homebrew/bin/pnpm
  Browsers:
    Brave Browser: 118.1.59.117
    Chrome: 127.0.6533.120
    Chrome Canary: 129.0.6666.1
    Safari: 17.5
mgifford commented 2 months ago

The config is now:

module.exports = {
  scanner: {
    include: [
      "/",
      "/about",
      "/foia",
      "/inspector",
      "/privacy",
      "/search",
      "/sitemap",
      "/accessibility",
      "/contact",
      "/fear",
      "/espanol",
      "/es",
      "/sitemap",
      "/sitemap.xml",
      "/blog",
      "/*"
    ],
    samples: 1,
    device: 'desktop',
    throttle: true,
    maxRoutes: 300,
    skipJavascript: false,
    sitemap: true,
  },
  chrome: {
    useSystem: true
  },
  debug: false,
};
harlan-zw commented 1 month ago

I've done some testing with your setup and it appears to be correct, at least on the latest version. I have pushed through several fixes since this issue so maybe something got fixed.

Just make sure you have dynamic sampling disabled.

If you still have issues could you provide a similar config reproduction :pray:

module.exports = {
  scanner: {
    // ...
    dynamicSampling: false, // ensure all routes are scanned
  },
};