Open osdiab opened 5 years ago
This kind of rules can be added an enabled using package options; Particularly I don't like this approach because the problem here is HTML markup is used totally different in every website, so even you think this could be improved data detection, in other cases, it will be worst.
Happy to accept a PR adding some conditional rules suggestions 🙂
Prerequisites
Irrelevant
package.json
.Subject of the issue
The
metascraper-description
package is fairly useful, but for pages that don't have the proper open/social graph tags, headings can be used as a fallback - i'm imagining a scraper that lists out up ton
headings on a page like so:I noticed this when seeing that
metascraper-description
for https://www.hekimaplace.org returnsnull
, but if you dump the URL into Facebook, it populates the body of the card with theh1
element at the top of the page by default, which turns out to make a lot of sense as a description.Steps to reproduce
Run the sample code provided in the frontpage of metascraper, but change the URL to the one mentioned above. You get
null
there.Then try opening up facebook or messenger, and dump the url in there; the card has a sensical description, indicating some other heuristic they're probably using - since text matches the
h1
i bet that's probably it.Expected behaviour
Either
metascraper-description
can try to use other heuristics, or a separate rule can be made that executes those heuristics separately and the client can choose which one they'd prefer.Actual behaviour
No clear way to use that heuristic... besides making a rule yourself :)