Closed ddbeck closed 5 years ago
Wow, this is incredible.
Is ?raw&summary
a public URL that I can use? I'm trying to add svg data from MDN and SVG2 spec, and I was using the $json
API. This would have been much better!
Here's the link: https://github.com/octref/svg-sample
@octref I'm pretty sure it's intended for public consumption. It's a bit hard to find, but parameters for pages are documented on this page (searching MDN for "URL parameters" returns a lot of results, believe it or not 😆)
@wbamberg Would you mind running the script again to see if you still get that error? I wasn't able to reproduce it myself. It seems like EEXIST
should never happen on mkdirSync
(at least when { recursive: true }
) but I was also calling it wrong, so who knows.
From what I can tell, mkdirSync
will throw if the given directory exists, and that's what I'm seeing. What works is if I change the code like:
const dir = path.dirname(dest);
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true });
}
fs.writeFileSync(dest, `${JSON.stringify(data, null, 2)}\n`);
But I wonder why you can't reproduce it. (from my reading recursive
is different, it means I can give it multiple/level/path
and have multiple
and level
created for me if they don't exist).
@wbamberg Thanks for the fix for mkdirSync
! I'm still really curious why it seems to be a machine-specific thing (maybe differences in underlying syscalls?), but if we've got a fix, then I guess it doesn't actually matter.
So this is great: in terms of workflow we could run this, and get a diff, and if the diff is legitimate (i.e. if the Wiki updates that it reflects are OK) we can make that into a PR against the short-descriptions repo to update the descriptions.
Yeah, I thought the diffs were pretty tractable too (plus --word-diff
will highlight intra-line, like a GitHub PR does).
About a summary and the massive traceback: yeah, I should fix those both. There's two places where I ought to catch some likely errors: fetching the property URL and writing to disk (obviously).
I'll work on these changes soon! Thanks again for your feedback, Will!
Thanks Daniel!
About a summary and the massive traceback: yeah, I should fix those both.
Just to be clear: I didn't mind the long traceback, I was more concerned about having a summary.
Thanks again for the feedback, Will. Here's a summary of what's changed:
There's now a summary when you run the scraper. It looks like this:
$ npm run scrape -- margin background overflow-block fake-property teapot
> mdn-short-descriptions@0.0.1 scrape /Users/ddbeck/TheWork/Mozilla/short-descriptions/ddbeck-short-descriptions
> node scripts/scrape.js "margin" "background" "overflow-block" "fake-property" "teapot"
Trace: <imagine long traceback here>
Attempted to scrape 5 properties.
Successfully scraped 2 properties.
Failed to scrape 3 properties:
overflow-block: No `mdn_url` found
fake-property: Property not found
teapot: StatusCodeError: 418 - {"type":"Buffer","data":[52,49,56,32,73,39,109,32,97,32,116,101,97,112,111,116]}
.then
chain to be confusing and pointless because most of it is synchronous anyway. I rewrote that to be (hopefully) more tidy with async/await. It might help to look at that commit in isolation, to see the changes.Let me know what you think. Thanks again!
OK, this ought to ingest some wiki page summaries and resolve #9.
Running
npm run scrape <property>
fetches a page from the wiki, such as https://developer.mozilla.org/docs/Web/CSS/animation-delay?raw&summary:And writes out this:
A few additional notes:
I'd welcome any and all comments. Thank you, @wbamberg!