maghoff / atomix

Atom feed generator that takes JSON input
ISC License
0 stars 0 forks source link

Extract structured data from posts #3

Open maghoff opened 9 years ago

maghoff commented 9 years ago

Posts ought to have structured data to be better presented in Google, Facebook, Twitter, etc.

If the posts have structured data, this can be expoited by Atomix to better automatically populate fields.

Let's take the NewsArticle format as an example. If a post's HTML file includes this markup, we can for example automatically extract headline, image, datePublished, description and alternativeHeadline.

(For another description of this format, see https://developers.google.com/structured-data/rich-snippets/articles)

This strategy overlaps slightly with #2 for the case of datePublished. However, it is perhaps best to control this timestamp explicitly to avoid false updates in people's feed readers.

maghoff commented 9 years ago

There seems to be lots of npm-packages available for parsing microdata.

maghoff commented 9 years ago

This markup can be really light weight, if you choose the right format. See http://magnushoff.com/pnacl.html for an example of a schema.org marked up article. Look for attributes named item...:

<html itemscope itemtype="http://schema.org/TechArticle">
<meta itemprop="datePublished" content="2015-03-25T12:00:00Z">
<link itemprop="image" href="assets/pnacl-banner-2x.png">
<h1 itemprop="headline">Actually getting started with Portable Native Client</h1>
<div itemprop="alternativeHeadline" class="subheading">...</div>