wrdrd / docs

WRD R&D Documentation – https://wrdrd.github.io/docs/
https://wrdrd.github.io/docs/
9 stars 1 forks source link

BLD,ENH: wrdrd.tools.crawl: RDFa #13

Open westurner opened 9 years ago

westurner commented 5 years ago

https://github.com/scrapinghub/extruct

extruct is a library for extracting embedded metadata from HTML markup.

It also has a built-in HTTP server to test its output as JSON.

Currently, extruct supports:

  • W3C's HTML Microdata
  • embedded JSON-LD
  • Microformat via mf2py
  • Facebook's Open Graph
  • (experimental) RDFa via rdflib