Closed xfq closed 2 years ago
I see it's a bug (https://github.com/website-scraper/node-website-scraper/issues/454) with the module we are using to scrap the spec. Unfortunately, this issue has been opened for a few months so we might need to find an alternative.
What about something like Scrapy and node-crawler?
I'd rather stick to a node module if possible. I'll give node-crawler a try. Hopefully I can send a PR by tomorrow.
It was actually surprisingly difficult to find a good module to download relative resources for an HTML document that's not served with the right content type. I ended up parsing the document myself and downloading all the href/src relative to the document. It should be good enough to generate the right snapshot!
I confirm that All Goes Well now. Thank you!
In https://labs.w3.org/spec-generator/?type=respec&url=https%3A%2F%2Fraw.githubusercontent.com%2Fw3c%2Fclreq%2Fgh-pages%2Findex.html there's garbled text like � �. This affects the PR preview function (see https://github.com/w3c/clreq/pull/455 for example) and has a great impact on the group participants' daily work.
/cc @deniak