wybiral / stream-sources

Tool for real-time scraping of news articles.
The Unlicense
39 stars 4 forks source link

Newspaper3k #2

Closed Immortalin closed 5 years ago

Immortalin commented 5 years ago

Will make life easier

wybiral commented 5 years ago

Thanks for the recommendation. That modules looks like it could be really useful.

This project is more focused on collecting: link, title, and summary; but it would be interesting to experiment with using something like newspaper3k to extract more of the article.

Immortalin commented 5 years ago

Do you plan to have a config for update frequency? Or maybe something like urlwatch? I think both would be nice i.e. for infrequently updated sites like a blog it can be routine scraping and change detection while for breaking news it's near real-time.

wybiral commented 5 years ago

I'm opening the update frequency as issue #5 and closing this one. Thanks!