scrapinghub / extruct

Extract embedded metadata from HTML markup
BSD 3-Clause "New" or "Revised" License
849 stars 113 forks source link

LD+JSON outside HTML element #194

Open bar24 opened 2 years ago

bar24 commented 2 years ago

Hi all.

If there is ld+json outside html element (html.head.body.html.ld+json) then parser returns empty list.

Firefox and W3C validator say: Stray start tag "script". So it is clear that site html document structure is at fault.

But maybe someone can apply fix until webmaster fix this.

Example site: https://www.spinneyslebanon.com/mevgal-bio-feta-cheese-200g.html

lopuhin commented 2 years ago

Thanks for reporting! Uploaded HTML from the website in case it changes: spinneyslebanon.html.zip