ncarlier / feedpushr

A simple feed aggregator daemon with sugar on top.
GNU General Public License v3.0
339 stars 26 forks source link

Some sites parsed incorrectly #73

Open pztrn opened 1 year ago

pztrn commented 1 year ago

Hello, I'm in docker on e38a6ee7b2037be20d6e2348dfa57c60551d7c9a with mail output plugin.

Some sites parsed incorrectly, e.g. sometimes new releases from github repositories appears like:

изображение

and no actual release information.

Confirmed feeds:

It happens absolutely randomly, sometimes it parses feed normally, sometimes it puts something like HTML head in letter (like on screenshot).

I was using latest release before, it was working fine.

ncarlier commented 1 year ago

Hello, are you using the fetch filter plugin ?

pztrn commented 1 year ago

Yes, it is enabled.

ncarlier commented 1 year ago

The feed is correctly parsed but the "fetch" filter tries to retrieve the HTML content of the original URL (via Web Scrapping technics). Some websites are not well scraped. It depends mainly of the page structure. I suggest you add a tag only on the feeds you want to be scrapped (ex: tofetch). Then add a condition on the fetch plugin to be activated only on this tag (ex: "tofetch" in Tags).