microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

Not scrapping the correct date #649

Closed ghmendonca closed 1 year ago

ghmendonca commented 1 year ago

Prerequisites

Subject of the issue

The metascraper-date is returning the wrong dates for this page:

https://www.fda.gov/food/food-additives-petitions/aspartame-and-other-sweeteners-food

If you look the meta tags, both og:updated_time and article:modified_time has the value Fri, 07/14/2023 - 10:30 but yet the metascraper is returning 2023-07-21T12:00:00.000Z. It's interesting that this date can't be find anywhere in the HTML.

Also, the published date is also wrong, the meta tag article:published_time has the value Tue, 07/11/2023 - 17:57 but the metascraper is returning 2023-07-18T12:00:00.000Z.

Expected behaviour

Return the correct dates, based on the meta tags.

Actual behaviour

Returning wrong dates that are not even in the HTML.

Kikobeats commented 1 year ago

Hello, and thanks for reporting. A couple of questions:

❯ $jsonld('dateModified')
'Fri, 07/14/2023 - 10:30'
❯ date($jsonld('dateModified'))
'2023-07-21T10:00:00.000Z'

so I think it's expected. Thoughts?