passiomatic / coldsweat

Web RSS aggregator and reader compatible with the Fever API
MIT License
145 stars 21 forks source link

Extend truncated feed items using regex #34

Closed aschilling closed 10 years ago

aschilling commented 10 years ago

Hi everybody,

it would be great if it would be possible for coldsweat to extend truncated feed items by fetching the relevant text from the website using Regex. In particular, it would be great if one could build feed specific configurations options such as in https://github.com/lformella/rss-extender.

Thanks in advance

Andy

passiomatic commented 10 years ago

The concept is intriguing and looks useful too, it's annoying when you have a truncated entry and you're forced to jump to the feed website to continue reading. To be honest most of the feeds I'm subscribed to haven't truncated entries at all — but still I can see some value in it.

On the other hand the feature is not trivial to implement. It implies to do an extra GET request to the original page for each new saved entry and there're a number of things that could do wrong, not to mention the inevitable slow down of the feed refresh process.

It looks like RSS extender has a whole directory dedicated to per-feed settings to extract meaningful content. This means that one should actively maintain such list, e.g.: if a web site push a redesign of its pages the system will likely break.

In the end I'm not sure if it's worth it. I need to think more about it.

aschilling commented 10 years ago

Hi,

thanks for the reply. I would really appreciate if you could have a look at the problem, nearly 80% of my favorite feeds are truncated. Although, I am not a sophisticated python programmer I think, BeautifulSoup could be worth a look http://www.crummy.com/software/BeautifulSoup/bs4/doc/. It makes parsing websites really easy. Calibre uses this library in order to generate customized news (http://manual.calibre-ebook.com/news.html).

Thanks

tewe commented 10 years ago

Using regular expressions for this will never work. A proven method is user-configurable XPath expressions.

passiomatic commented 10 years ago

@tewe Did you manage to write a plugin to extend entries? Would you like to share the code perhaps in a Gist, so I can public it here for future reference? Thank you.

tewe commented 10 years ago

I did write one for a certain kind of site, but did not get around to a generic one yet.

passiomatic commented 10 years ago

I'm gonna to close this, as @tewe said, it can be done via a custom plugin. Feel free to add your own gists here if you come up with something interesting in the future, then we can probably create a dedicated wiki page.