microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

feat: allow passing `htmlDom` #711

Closed masylum closed 5 months ago

masylum commented 5 months ago

This allows customizing the parser (using htmlparser2 instead of default's more strict and slower parse5). Also, if you do any post-processing, you can reuse the object and avoid parsing twice (expensive)

Kikobeats commented 5 months ago

Thanks for this!

Whe you pass htmlDom keep in mind you should to also take care about the url for resolving relative URLs:

const { load } = require('cheerio')
const htmlDom = load(html, { baseURI: url })