microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

Handling unescaped quotes in og:description #455

Closed teddybradford closed 3 years ago

teddybradford commented 3 years ago

Prerequisites

Subject of the issue

I ran into a site with a malformed og:description that looks like this:

<meta property="og:description" content=""It was like a shade had been pulled and then lifted."">

The quotes are unescaped. Is there a way to handle such cases?

Expected behavior

Fix the malformed HTML and return a description.

Actual behavior

No description is returned.

Kikobeats commented 3 years ago

It's actually a cheerio thing

> require('cheerio').load('<meta property="og:description" content=""It was like a shade had been pulled and then lifted."">')('meta').attr('content')
''

unfortunately that markup looks very malformed and can't be handled