wikimedia / html-metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)
MIT License
169 stars 44 forks source link

Add COinS metadata parser #21

Closed mvolz closed 9 years ago

mvolz commented 9 years ago

Add parseCOinS() which selects all span tags with class Z3988[1], and parses the title tag into an object. Hierarchical keys (e.g. rft.date) are split, with the top level key 'rft' pointing to an object with the second level key as a key in that object. Then the object is added to the list, as multiple COinS objects can exist in a page.

There is also a separate exportable parsing function parseCOinSTitle() for the contents of the title tag, which can be used separately, e.g. for parsing the contents of a cross ref request [2].

[1] http://ocoins.info/#id3205609413 [2] http://search.crossref.org/dois?q=10.5555%2F12345678

Phabricator issue: https://phabricator.wikimedia.org/T104174

Bug: T104174