Thinking of extending my morning news broadcast transcriber to annotate (guess/cluster) the day’s news stories... Could then produce a little web review page like a more intelligent RSS reader, and any unmatched stories would be salient (potentially broken on the air)
Turns out there is a dedicated Python package for extracting news stories from [I presume 3,000] different news sites https://pypi.org/project/newspaper3k/
Original idea:
Turns out there is a dedicated Python package for extracting news stories from [I presume 3,000] different news sites https://pypi.org/project/newspaper3k/
Used in this simple extractor https://github.com/FusionRico/news_finder/blob/master/code/main.py which seems to poll the UK news sites listed here: https://blog.feedspot.com/uk_news_rss_feeds/