Closed somini closed 6 years ago
Duh, it's not the rewriter. This is the source HTML.
<figure class="media-landscape has-caption full-width lead">
<span class="image-and-copyright-container">
<img class="js-image-replace" alt="Cartoon of Tony the IT admin receiving an email from a hacker" src="https://ichef.bbci.co.uk/news/660/cpsprodpb/187E/production/_102807260_cybersec_1_highres.jpg" data-highest-encountered-width="660" width="976" height="549">
<span class="off-screen">Image copyright</span>
<span class="story-image-copyright">TOM HUMBERSTONE</span>
</span>
<figcaption class="media-caption">
<span class="off-screen">Image caption</span>
<span class="media-caption__text">
The day starts badly for IT administrator Tony Lewis when he reads an email from a hacker
</span>
</figcaption>
</figure>
Should I just use the rewriter, or is it worth to special case wrapping all free text inside figure
with figcaption
?
I doubt the add_image_title rewriter is doing this. The source HTML doesn't have a title
attribute in the img
tag, so the rewriter wouldn't even process it.
The problem, I think, is that the markup is invalid. They're using span
tags (which are inline) with img
tags (which are block). This isn't really the way you're supposed to do it.
If the rewriter finds other img
tags with title
attributes in the source it does run the source through GoQuery, which affects the markup (behind the scenes it's interpreting the HTML, similar to a web browser). But that doesn't change the fact that their source is a little messy. Such is the web, though.
Do you need the rewriter for a site that's already putting in captions?
Edit: Come to think of it, the entry is shown on the UI after being run through GoQuery* for the proxy, etc. so the markup is being changed as a result of that regardless of the rewriter. I don't think there's a solution other than the source site fixing their markup.
*I haven't gone through GoQuery's code but I'm guessing it's just abstracting GoLang's HTML processor. The end result, regardless, is it trying to "fix" invalid markup (again, just like a browser would).
I double checked and the rewriter is not configured for that feed, it's pure Web messiness.
I'll live with it, this is not a "bug" on Miniflux per-se.
The following URL: https://www.bbc.co.uk/news/technology-44482380
Produces a figure with the following HTML:
That "Image copyright" text is not inside a caption, so it gets rendered as regular text, inline with the image. It should be put inside a
figcaption
like the rest, so it can be rendered nicely.I think this comes from this piece of code, but I'm not sure. https://github.com/miniflux/miniflux/blob/6d25e02cb5e3ddd9f3fc8dfb44cf29c98866d587/reader/rewrite/rewrite_functions.go#L34