ysk24ok / jekyll-linkpreview

Jekyll plugin to generate link preview
https://rubygems.org/gems/jekyll-linkpreview
MIT License
30 stars 8 forks source link

Non-HTML spec compliant pages cause plugin to miss crucial elements #38

Closed Clpsplug closed 1 year ago

Clpsplug commented 1 year ago

Original report: https://github.com/ysk24ok/jekyll-linkpreview/issues/37#issuecomment-1399803503

The issue

A webpage can write something like this:

<html>
<head>
  <link rel="stylesheet" href="something.css">
</head>
<body>
  <title>Title here</title> <!-- Wait, that's illegal -->
</body>
</html>

This HTML has a <title> tag within the <body> tag, which is illegal.

This causes MetaInspector, one of this plugin's dependencies, to miss this element completely when attempting to access the .title accessor; it only looks at the <title> tag in the <meta> tag.

An example of such pages is https://docs.unity3d.com/Packages/com.unity.inputsystem@1.0/api/UnityEngine.InputSystem.InputSystem.html.

The fix

MetaInspector provides an alternative: the .best_title accessor, which also looks at the title tag within the body tag. We can modify this part below to produce the best result possible.

https://github.com/ysk24ok/jekyll-linkpreview/blob/61c8b0b1e8f5f004c6df1248f9f0d71577a01bfe/lib/jekyll-linkpreview.rb#L117-L121

ysk24ok commented 1 year ago

@Clpsplug v0.6.0 which includes this change has been released. Feel free to try it out.