mdn / short-descriptions

Short descriptions of web platform features, for flexible usage in applications.
Other
12 stars 6 forks source link

Scrape wiki pages #11

Closed ddbeck closed 5 years ago

ddbeck commented 5 years ago

OK, this ought to ingest some wiki page summaries and resolve #9.

Running npm run scrape <property> fetches a page from the wiki, such as https://developer.mozilla.org/docs/Web/CSS/animation-delay?raw&summary:

The <strong><code>animation-delay</code></strong> <a href="/en-US/docs/CSS">CSS</a> property sets when an animation starts. The animation can start later, immediately from its beginning, or immediately and partway through the animation.

And writes out this:

{
  "css": {
    "properties": {
      "animation-delay": {
        "__short_description": "The <strong><code>animation-delay</code></strong> <a href='https://developer.mozilla.org/docs/CSS'>CSS</a> property sets when an animation starts. The animation can start later, immediately from its beginning, or immediately and partway through the animation."
      }
    }
  }
}

A few additional notes:

I'd welcome any and all comments. Thank you, @wbamberg!

octref commented 5 years ago

Wow, this is incredible.

Is ?raw&summary a public URL that I can use? I'm trying to add svg data from MDN and SVG2 spec, and I was using the $json API. This would have been much better!

Here's the link: https://github.com/octref/svg-sample

ddbeck commented 5 years ago

@octref I'm pretty sure it's intended for public consumption. It's a bit hard to find, but parameters for pages are documented on this page (searching MDN for "URL parameters" returns a lot of results, believe it or not 😆)

ddbeck commented 5 years ago

@wbamberg Would you mind running the script again to see if you still get that error? I wasn't able to reproduce it myself. It seems like EEXIST should never happen on mkdirSync (at least when { recursive: true }) but I was also calling it wrong, so who knows.

wbamberg commented 5 years ago

From what I can tell, mkdirSync will throw if the given directory exists, and that's what I'm seeing. What works is if I change the code like:

  const dir = path.dirname(dest);
  if (!fs.existsSync(dir)) {
    fs.mkdirSync(dir, { recursive: true });
  }
  fs.writeFileSync(dest, `${JSON.stringify(data, null, 2)}\n`);

But I wonder why you can't reproduce it. (from my reading recursive is different, it means I can give it multiple/level/path and have multiple and level created for me if they don't exist).

ddbeck commented 5 years ago

@wbamberg Thanks for the fix for mkdirSync! I'm still really curious why it seems to be a machine-specific thing (maybe differences in underlying syscalls?), but if we've got a fix, then I guess it doesn't actually matter.

So this is great: in terms of workflow we could run this, and get a diff, and if the diff is legitimate (i.e. if the Wiki updates that it reflects are OK) we can make that into a PR against the short-descriptions repo to update the descriptions.

Yeah, I thought the diffs were pretty tractable too (plus --word-diff will highlight intra-line, like a GitHub PR does).

About a summary and the massive traceback: yeah, I should fix those both. There's two places where I ought to catch some likely errors: fetching the property URL and writing to disk (obviously).

I'll work on these changes soon! Thanks again for your feedback, Will!

wbamberg commented 5 years ago

Thanks Daniel!

About a summary and the massive traceback: yeah, I should fix those both.

Just to be clear: I didn't mind the long traceback, I was more concerned about having a summary.

ddbeck commented 5 years ago

Thanks again for the feedback, Will. Here's a summary of what's changed:

Let me know what you think. Thanks again!