Open layoutanalysis opened 5 years ago
I also noticed that minimalcss times out on certain web.archive snapshots:
const minimalcss = require("minimalcss");
minimalcss
.minimize({
urls: ['http://web.archive.org/web/20161001001006/https://www.theguardian.com/us'],
ignoreJSErrors: true,
withoutjavascript: true,
ignoreCSSErrors: true,
loadimages: false,
enableServiceWorkers: true,
timeout: 90000,
cssoOptions: {restructure: false},
skippable: request => {
return request.url().indexOf('theguardian.com') === -1;
}
})
.then(result => {
console.log(result.finalCss);
})
.catch(error => {
console.error(`Failed the minimize CSS: ${error}`);
});
results in the error
Failed the minimize CSS: TimeoutError: Navigation Timeout Exceeded: 90000ms exceeded
Tracked URLs that have not finished: http://web.archive.org/web/20161001001006/https://www.theguardian.com/us, http://web.archive.org/web/20161001001006/https://www.theguardian.com/us-news/series/politics-for-humans/rss
This error also happened with timeout: 560000
(9 minutes timeout).
Maybe it makes sense to stop all pending requests at start_time + (timeout - 10%)
and use the remaining time to calculate the used_css and return it?
Timeout can be a puppeteer bug. Related https://github.com/peterbe/minimalcss/issues/112
What @stereobooster said is true.
But I wonder, why do you have enableServiceWorkers: true
in there?
I would like to use minimalcss to extract the used css from http://web.archive.org/ snapshots of a webpage (e.g http://web.archive.org/web/20110310061818/http://www.bloomberg.com/) and compare the results over time to find out how often the layout/appeareance of a webpage has changed in the past.
Unfortunately this is not so easy with minimalcss, because it stops working whenever a stylesheet cannot be fetched (404 error). 404s are a very common thing on web.archive.org, as many captures are incomplete. I could partially work around them using the
skippable
function, but it only lets me skip the request upfront - i cannot react on response errors. My preferred behaviour would be to output the used css to stdout vs. logging the unretrievable stylesheet urls to stderr.Another issue is the mandatory CSSO-Optimisation, which crashes on certain CSS property values. I could mitigate some crashes by setting
cssoOptions: {restructure: false}
, but it would be nicer if i could disable the optimisation altogether.I'm aware that my use case is somewhat uncommon for minimalcss, but maybe the library can be extended to make it possible?