seazon / FeedMe

The documents and forum of FeedMe
1.26k stars 28 forks source link

Image file doesn't get shown correctly when there is a stray < in the title text #521

Open noiob opened 1 year ago

noiob commented 1 year ago

This is a generated feed from freshRSS, not the raw feed of the website. It shows up correctly in the freshRSS web interface. This is the only post that doesn't show up correctly.

Looks like the html entity (&lt;) is getting replaced too early, i.e. before the html parsing.

Screenshot_20230615-134148

the post is this, I have set freshRSS to extract #cc-comicbody, #bottom-text

Here's what freshRSS tells me is the generated post's source:

<div data-sanitized-id="cc-comicbody"><img title="thanks a ton to those who have already supported all this&lt;333" src="https://www.gogetaroomie.com/comics/1686813207-1%20dayb.png" data-sanitized-id="cc-comic"> </div>
<div data-sanitized-id="bottom-text">
    <div data-sanitized-class="cc-newsarea">
        <div data-sanitized-class="cc-newsheader">crowdfunding doodle</div>
        <div data-sanitized-class="cc-publishtime">Posted June 15, 2023 at 03:15 am</div>
        <div data-sanitized-class="cc-newsbody">
            <p>Okay folks, since we're in the<b> last 24 hours of the crowdfunding</b>, I'm putting up this lil' roomillian post so everyone can seize this last chance at getting the <b>4th and LAST book of <i>Go Get a Roomie!</i></b>, plus any other past books if you wish, and save up some money if you order them together now!</p>
            <p>Last go!! ((and almost 200% funded too ╰(*´︶`*)╯))</p>
            <h3><span>------&gt; </span><a href="https://hivemill.com/products/go-get-a-roomie-book-4">https://hivemill.com/products/go-get-a-roomie-book-4</a><a href="https://hivemill.com/products/go-get-a-roomie-book-4" target="_blank"><span></span></a> &lt;-------</h3>
            <p><br></p>
            <p>(if you want to read today's update, click on the "previous" arrow to find it!)</p>
        </div>
    </div>
    <div data-sanitized-class="cc-tagline">Tags: <a href="https://www.gogetaroomie.com/ggar-rerun/search/crowdfunding">crowdfunding</a>, <a href="https://www.gogetaroomie.com/ggar-rerun/search/roomillian">roomillian</a>, <a href="https://www.gogetaroomie.com/ggar-rerun/search/doodle">doodle</a></div>
    <div data-sanitized-id="comment-space"></div>

    <div data-sanitized-class="cc-commentheader">Comments</div>
    <div data-sanitized-class="cc-commentbody">
        <div data-sanitized-id="hyvor-talk-view"></div>

    </div>
</div>
seazon commented 1 year ago

Seems ok on my side. Please try the newest version on Github.

Screenshot_20230615_200531

noiob commented 1 year ago

That's not even the same page. All the other pages don't have brackets in the image title text.

Here's what it looks like on 4.0.0-canary-3. I'm not reading the raw feed but a version scraped from the website by FreshRSS.

Screenshot_20230615-150427