Some sites parsed incorrectly

pztrn commented 1 year ago

Hello, I'm in docker on e38a6ee7b2037be20d6e2348dfa57c60551d7c9a with mail output plugin.

Some sites parsed incorrectly, e.g. sometimes new releases from github repositories appears like:

and no actual release information.

Confirmed feeds:

Github releases.
https://linux.org.ru site (like https://www.linux.org.ru/section-rss.jsp?section=1 feed).

It happens absolutely randomly, sometimes it parses feed normally, sometimes it puts something like HTML head in letter (like on screenshot).

I was using latest release before, it was working fine.

ncarlier commented 1 year ago

Hello, are you using the fetch filter plugin ?

pztrn commented 1 year ago

Yes, it is enabled.

ncarlier commented 1 year ago

The feed is correctly parsed but the "fetch" filter tries to retrieve the HTML content of the original URL (via Web Scrapping technics). Some websites are not well scraped. It depends mainly of the page structure. I suggest you add a tag only on the feeds you want to be scrapped (ex: tofetch). Then add a condition on the fetch plugin to be activated only on this tag (ex: "tofetch" in Tags).

ncarlier / feedpushr

Some sites parsed incorrectly #73