v17development / flarum-seo

Perfect SEO for your Flarum forum
https://discuss.flarum.org/d/18316-flarum-seo
MIT License
44 stars 20 forks source link

Fix invalid meta caused by faulty social image RegEx #33

Closed davwheat closed 4 years ago

davwheat commented 4 years ago

At the moment, the social image generation will fail with multiple image URLs in the original post, for example:

<p>Hello!</p>
<a href="https://example.com/image.png" target="_blank" rel=" noopener nofollow ugc"><img src="https://example.com/image.png" title="" alt=""></a>

The current regex (below) will match from the beginning of the href attribute to the end of the src attribute, resulting in the unreadable meta tag:

https://example.com/image.png" target="_blank" rel=" noopener nofollow ugc"><img src="https://example.com/image.png

https://github.com/v17development/flarum-seo/blob/d814c00d7c66cb00f8391e793597aed5947de0f9/src/Listeners/PageListener.php#L477

The above line can be changed to /(?<=src=")((http.*?\.)(jpe?g|png|[tg]iff?|svg))(?=")/ to fix this.

The changes I've made is adding a positive lookbehind for src=" to only match images, and not URLs in general, also adding a positive lookahead for " to find the end of the src attribute, and finally making the .* to match any character non-greedy (so it won't match many image tags on one line.

Tests: https://regexr.com/5701f

Do note that this will no longer match images only linked to in Markdown ([A super cool image](https://example.com/image.png))

@jaspervriends

jaspervriends commented 4 years ago

Thanks for the PR! I'll update the extension later this week with your PR included :)

Do note that this will no longer match images only linked to in Markdown That's a compromise I'm okay with :)