mozilla / readability

A standalone version of the readability lib
Other
8.34k stars 579 forks source link

Add `articleBody` to the metadata when found in the Article Schema Markup #846

Open tarekziade opened 4 months ago

tarekziade commented 4 months ago

A website like CNN provides the whole article body in its Article Schema Markup block in the articleBody

Readability could copy that value in _getArticleMetadata and return it in parse

Maybe it could also be leveraged to improve the parser output

Happy to do a patch :)

tarekziade commented 4 months ago

I should mention that articleBody is part of the standard https://schema.org/Article