ysk24ok / jekyll-linkpreview

Jekyll plugin to generate link preview
https://rubygems.org/gems/jekyll-linkpreview
MIT License
29 stars 8 forks source link

Pages with meta tags, all of whose property attr are unrelated to OG properties ruins the preview #37

Closed Clpsplug closed 1 year ago

Clpsplug commented 1 year ago

The issue

I attempted to linkpreview the following URL: https://docs.unity3d.com/Packages/com.unity.inputsystem@1.0/api/UnityEngine.InputSystem.InputSystem.html It is a documentation page for Unity's package.

Expected behavior

This link does not appear to have any OGP properties, so I expected this plugin to produce a "no OGP image link preview snippet" with the title Class InputSystem | Input System | 1.0.2.

Actual behavior

This happened.

Image from Gyazo

I looked into the cache and found that the corresponding file had this:

{"title":null,"url":null,"image":null,"description":null,"domain":"docs.unity3d.com"}

(filename: cec2702fa0dbc6125ac754817ddbb10b.json)

Suspected issues

This page has these <meta> tags in its HTML:

    <meta property="docfx:navrel" content="../toc.html">
    <meta property="docfx:tocrel" content="toc.html">
    <meta property="unity:packageTitle" content="Input System | 1.0.2">
    <meta property="docfx:rel" content="../">

So this page has <meta> tags with a property attribute, but its value is never og:.*. This edge case of an HTML page causes this method to miss this case and causes it to spawn OpenGraphPropertiesFactory when it should be spawning NonOpenGraphPropertiesFactory instead.

https://github.com/ysk24ok/jekyll-linkpreview/blob/61c8b0b1e8f5f004c6df1248f9f0d71577a01bfe/lib/jekyll-linkpreview.rb#L190-L198

(Any <meta> tag whose property attribute value matches og:.* on this page and the plugin would have avoided this issue.)

Potential fix

In addition to page.meta_tags['property'].empty?, check for the total absence of property attributes whose value matches regex og:.*. If any of those holds, we should use NonOpenGraphPropertiesFactory.

Clpsplug commented 1 year ago

I also noticed that this specific page has quite a unique structure - the <title> tag is in the <body> tag. For this reason, the preview was missing a title even if I applied my proposed fix locally. This is because MetaInspector::Parsers::TextsParser.title only looks at the title tag within the head tag.

We cannot blame them, though; having a <title> tag outside of the <meta> tag technically violates the HTML spec.

But there is an alternative: the .best_title accessor, which also looks at the title tag within the body tag. We could also modify this part below to produce the best result possible.

https://github.com/ysk24ok/jekyll-linkpreview/blob/61c8b0b1e8f5f004c6df1248f9f0d71577a01bfe/lib/jekyll-linkpreview.rb#L117-L121

I'm going to prep a PR, but should the 'using .best_title accessor instead' fix be a separate PR from the 'non-OGP meta tags messing with the factory selection' fix?

ysk24ok commented 1 year ago

Thank you for reporting this.

In addition to page.meta_tags['property'].empty?, check for the total absence of property attributes whose value matches regex og:.*.

Right. We should check these four properties as https://ogp.me/#metadata says they are required for every page.

I'm going to prep a PR, but should the 'using .best_title accessor instead' fix be a separate PR from the 'non-OGP meta tags messing with the factory selection' fix?

I think so, and could you put https://github.com/ysk24ok/jekyll-linkpreview/issues/37#issuecomment-1399803503 as a new issue so a PR can close the corresponding issue?