miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
6.73k stars 711 forks source link

Could we get a feature to follow or fix relative links? #2648

Closed dwarf-king-hreidmar closed 1 month ago

dwarf-king-hreidmar commented 4 months ago

Could we get feature to setup rewrite rules / scraping rules to fix relative links on original content? I found another site using relative links so when I load the original content in miniflux all my links are broken. Links on this page look like this:

<img alt="Comic" src="/proxy/HS2QXTXBG3C9u4kRKYfb5AUJp-882RT-BGL14euRejk=/aHR0cDovL3d3dy5naXJsZ2VuaXVzb25saW5lLmNvbS9nZ21haW4vc3RyaXBzL2dnbWFpbjIwMjQwMzE4YS5qcGc=" width="700" height="1036" loading="lazy">

source: https://www.girlgeniusonline.com/ggmain.rss

fguillot commented 4 months ago

The problem has nothing to do with relative links. Miniflux already rewrite relative links to absolute URLs.

The problem is related to the requests made by the built-in media proxy. By default, plain text links will be proxified. If you look at the logs, you can see that this website returns a 503 status code for the requests made by Miniflux media proxy:

time=2024-05-17T17:51:35.766-07:00 level=WARN msg="MediaProxy: Unexpected response status code" media_url=http://www.girlgeniusonline.com/ggmain/strips/ggmain20240503a.jpg status_code=503

There are different ways to work around this:

  1. Disable the internal proxy: MEDIA_PROXY_MODE=none
  2. Keep the default settings for media proxy modes (do not proxify https images), and use a rewrite rule, for example: replace("http://"|"https://")
  3. Modify Miniflux's source code to forward the browser user agent while making the proxy request because this website seems to block requests based on the user agent.
dwarf-king-hreidmar commented 1 month ago

Thanks fguillot. I'll look into it.