webdevops / TYPO3-metaseo

TYPO3 MetaSEO Extension
https://typo3.org/extensions/repository/view/metaseo
GNU General Public License v3.0
38 stars 25 forks source link

google: no page date in search results #67

Open thomaszbz opened 9 years ago

thomaszbz commented 9 years ago

As of 2697c177b3d39f83bf0dc83521207d17d146b850 using HTML5 meta tags, google is not showing a date before the page description in search results.

I have 2697c177b3d39f83bf0dc83521207d17d146b850 running now for multiple weeks and the search results are like

https://www.example.com
Some description

while it's

https://www.not.example.com
01.01.2015 Some description

on other web sites (from other individuals).

To keep in mind is that we have changed the date meta tag to validate against HTML5.

Besides meta tags, the date could also come from the sitemap xml. I noticed, that google search console (aka web master tools), section "Content Keywords" show me a bunch of entries like

09t23
daily

That clearly indicates that google is parsing sitemap meta information as content.

I sent google a sitemap index URL of the form

https://www.example.com/?id=2&type=841132

which contains the URL of the real sitemap xml:

https://www.example.com/index.php?id=2&type=841132&page=1&cHash=...

It could have happened that google is not reading the latter one as a sitemap but as page content. I also see its url in the search result which is also stupid.

Therefore, I now gave google the sitemap URL (instead of the sitemap index URL)

https://www.example.com/?id=2&type=841132&page=1

and now I will hopefully see if something changes in the next days in respect to page dates and stupid "Content Keywords".

If google does not handle sitemap indexes right, I would have to give google the "page=1" version. Is there some pagination used in metaseo? What is the cHash good for?

Or is something missing to identify the "page=1" version as a sitemap? I would say that google even is interpreting it right as a sitemap because I get the right number of links shown in Search Console while otherwise there would be just one link. In a way that's not looking consistent what google seems to be doing here.

For the moment, I think metaseo has a valid implementation of the sitemap and sitemap index protocol.

This issue needs investigation. For google reverse engineering purposes experiences from other users would be welcome. Perhaps someone finds an official statement how google thinks web developers should seo HTML5 and propagate a page date (how simple is that?). At least there should be some "best practise" communicated by W3C but all I see is confusion and meta/structured data/validation hell for HTML5. In the end, the page html should contain a date because in general, sitemaps are optional.

References: http://www.sitemaps.org/protocol.html

thomaszbz commented 9 years ago

maybe a related issue: https://forge.typo3.org/issues/62004

thomaszbz commented 9 years ago

still no date in google search results.

There's some documentation: https://developers.google.com/custom-search/docs/structured_data#formatting_dates

The "complete date"

2009-12-31

is in the list while a perfect ISO 8601 timestamp like

2015-07-02T05:04:22+02:00

is not in the list.

Google's structured data "testing" tool returns

pmr-metatags-date
pmr-metatags-date-00
pmr-metatags-date-02
pmr-metatags-date-02t05
pmr-metatags-date-04
pmr-metatags-date-07
pmr-metatags-date-2015
pmr-metatags-date-2015-07-02t05
pmr-metatags-date-22

for

<meta name="date" content="2015-07-02T05:04:22+02:00">

and

pmr-metatags-dcterms.date
pmr-metatags-dcterms.date-00
pmr-metatags-dcterms.date-02
pmr-metatags-dcterms.date-02t05
pmr-metatags-dcterms.date-04
pmr-metatags-dcterms.date-07
pmr-metatags-dcterms.date-2015
pmr-metatags-dcterms.date-2015-07-02t05
pmr-metatags-dcterms.date-22

for

<meta name="DCTERMS.date" content="2015-07-02T05:04:22+02:00">

while documentation clearly indicates that this formatting of ISO 8601 timestamp is perfectly allowed. Same applies for sitemaps.

thomaszbz commented 9 years ago

References in detail: http://www.sitemaps.org/protocol.html clearly has both formats in the examples:

2004-10-01T18:23:17+00:00
2005-01-01

http://dublincore.org/documents/dcmi-terms/ says: Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF].

W3CDTF is documented here: http://www.w3.org/TR/NOTE-datetime . It includes

Complete date plus hours, minutes and seconds:
YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)

which is pretty much what we do right now.

We could also try out the zero-offset timezone designator Z as described in https://tools.ietf.org/html/rfc3339 which is a subset of ISO 8601:

Here are some examples of Internet date/time format.
1985-04-12T23:20:50.52Z
This represents 20 minutes and 50.52 seconds after the 23rd hour of April 12th, 1985 in UTC.

Z is also allowed as timezone designator by http://www.w3.org/TR/NOTE-datetime

Or shall we go back to ISO-dates YYYY-MM-DD everywhere, to the loss of timezone information, until google makes their homework? I'm wondering how their artificial-intelligence cars can detect walls...

thomaszbz commented 9 years ago

On https://developers.google.com/open-source/organizations, google uses a date:

<p class="devsite-content-footer-date" itemprop="datePublished"
    content="2015-07-01T18:27:42.282270">
    Zuletzt aktualisiert am Juli 1, 2015
</p>

ok, there's microseconds without a timezone. Great. This is rendered to

datePublished: 2015-07-01T18:27:42

which is basically what we want.

So we could just convert to UTC and use the format 2015-07-01T18:27:42.

I'll try this out and see what happens. I'll also test the zero timezone designator Z. I guess google just wants to avoid the complexity of timezones.

thomaszbz commented 7 years ago

Another example:

<meta itemprop="datePublished" content="2015-02-05T08:00:00+08:00"/>
<meta itemprop="dateModified" content="2015-02-05T09:20:00+08:00"/>

Taken from here.