harlan-zw / unlighthouse

Scan your entire site with Google Lighthouse in 2 minutes (on average). Open source, fully configurable with minimal setup.
https://unlighthouse.dev
MIT License
3.88k stars 115 forks source link

Puppeteer Worker only allows .html or no declared extension #231

Closed madolinn closed 1 month ago

madolinn commented 3 months ago

https://github.com/harlan-zw/unlighthouse/blob/a792d163e03b09a0309384362ef970a6a47f90d9/packages/core/src/puppeteer/worker.ts#L111

Unlighthouse will not crawl, nor accept routes through a sitemap, that are explicitly defined filetypes that could potentially return HTML content that aren't HTML file formats.

ASPX, PHP, etc are some valid response extensions that would return potentially valid and parse able HTML documents.

Adding a whitelist extensions option for routes could resolve this.