Open jocmp opened 2 weeks ago
The articles' main image isn't shown in Capy's full content mode for the following feed: https://mobilesyrup.com/feed/
Article example: https://mobilesyrup.com/2024/11/28/google-releases-ai-generated-pieces-chess-game/
(I only noticed this today so maybe it used to work?)
HTML of the image:
<img fetchpriority="high" width="1867" height="1046" src="https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess.jpg" class="attachment-full size-full wp-post-image" alt="" decoding="async" srcset="https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess.jpg 1867w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-300x168.jpg 300w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-1024x574.jpg 1024w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-768x430.jpg 768w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-1536x861.jpg 1536w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-417x235.jpg 417w" sizes="(max-width: 1867px) 100vw, 1867px" />
@PhilC813 an update. I'm toying around with Mercury Parser again and seeing some potential upsides. Here's a comparison of a Les Versants article.
Before | After |
---|---|
Mobile Syrup
Before | After |
---|---|
Waw, seems very promising.
Do you mind checking with this article? https://mobilesyrup.com/2024/11/28/here-are-the-2024-staples-black-friday-deals/
It's an article with Black Friday deals, and the current parser basically removes all the bullet points in which the deals are listed 😅
The new parser skips over lists by default, but with a little bit of code it works: https://github.com/jocmp/capyreader/pull/569/files#diff-a5310ab57bf17835286b2a012ceca522b0f9af190ceeea2dcf80c52f82c6479dR41-R49
So you can easily specify the <ul>
tag as an exception, sweet. Frankly I don't really see a reason why they would be excluded by default. They are more likely to be content than ads.
Also, is there any parser that is still actively maintained? Mercury seems abandoned like Readability4J. It's not necessarily a problem, but having an active project is always a +.
Couldn't agree more. I think Mercury is more extensible and maintainable between the two. I forked it and I'm working on bringing its dependencies up to date here: https://github.com/jocmp/mercury-parser.
Background
Capy Reader uses a library called Readability4J that has a few rules to parse the article's full content.
Sometimes those rules fail leading to missing images in Capy's full content mode. This is an annoying issue without a single fix-all solution. Every website is different and changes over time which is part of the beauty and chaos of the web.
If you run into this issue with a feed, please post a link to the feed with an example to this thread. I'll track these to fix some point in the future. Thanks!
Feeds