guardian / frontend

The Guardian DotCom.
https://theguardian.com
Other
5.84k stars 555 forks source link

Missing itemprop meta-data on some pages #13333

Open katebee opened 8 years ago

katebee commented 8 years ago

ISSUE

Performing a Google search for 'Guardian' returns page previews under the 'News' tab (see image). Some previews are using the OpenGraph image with an overlay.

If the main media of a page is a video, or page.contentType is video, Google is taking the OpenGraph image as thumbnail. This appears to be a fallback behaviour, either because:

https://support.google.com/news/publisher/answer/13369?hl=en http://schema.org/

Steps to Reproduce

Go to Google, search for 'Guardian' or 'Guardian news' and click onto the 'News' tab. Look at the image thumbnails, some will have the branded overlay.

Actual Results (include screenshots)

Results of searching 'Guardian'
picture 65
Results of searching 'Guardian Video'

Expected Results (include screenshots)

Here is a page that shows up correctly in Google: https://www.theguardian.com/world/video/2016/jun/17/canadian-mp-breaks-down-in-tears-tribute-jo-cox-video

correct thumbnail

URL

https://www.google.co.uk/webhp#q=guardian+video&tbm=nws

http://www.theguardian.com/commentisfree/2016/jun/16/the-guardian-view-on-jo-cox-an-attack-on-humanity-idealism-and-democracy

TBonnin commented 8 years ago

Does this mean there is also a problem with the AMP validator when there is not itemprop="image"?

katebee commented 8 years ago

AMP appears to have it's own standard, so will not be affected.

AMP uses this type of tagging system to get content: <amp-img src="welcome.jpg" alt="Welcome" height="400" width="800"></amp-img>

https://www.ampproject.org/docs/reference/spec.html

From what I can understand, the headers of article pages include: <link rel="amphtml" href="https://amp.theguardian.com/...... This takes the AMP bot to a page with AMP tagged content, including amp-img.

TBonnin commented 8 years ago

Sorry I used the wrong term. I am not talking about the amp validator but the google search index validation for amp page (ie: if page doesn't validate it is not shown in the carrousel) which is using schema.org specs. (I know, super confusing 😕) I've just tried the url you provided with the google webmaster tool and an error is raised: A value for the image field is required. https://search.google.com/structured-data/testing-tool?url=https%3A%2F%2Famp.theguardian.com%2Ftravel%2F2002%2Fmar%2F24%2Fhotels4#url=https%3A%2F%2Famp.theguardian.com%2Fcommentisfree%2F2016%2Fjun%2F16%2Fthe-guardian-view-on-jo-cox-an-attack-on-humanity-idealism-and-democracy

So this issue where there is no main image for article with main video seems to also affects amp results. A same fix would solve both issues I believe. I don't know if it change the priority of this issue though. Maybe @stephanfowler can jump in.

stephanfowler commented 8 years ago

If the missing main image causes the AMP cache to reject the page, that would make it a priority to fix. If that work also prevents the overlay on Google search, all the better.

katebee commented 8 years ago

Still working on this. Just opened PR #13533

Google search results are a bit of a black box; I am starting with minor changes to the metadata on our pages and will gradual progress to more drastic ones... until I get the result I want!

I have a feeling that, for articles without an ImageObject (that may only have a VideoObject), we need a similar workaround as used for AMP in articleBody.scala.html, where we add a DIV with the required meta itemprops so Google doesn't fallback to the OpenGraph tags.