lmmx / tap

dex ⠶ tap – an audio transcriber for web radio
MIT License
1 stars 0 forks source link

Match summarised news items to news stories crawled via RSS #6

Open lmmx opened 3 years ago

lmmx commented 3 years ago

Original idea:

Thinking of extending my morning news broadcast transcriber to annotate (guess/cluster) the day’s news stories... Could then produce a little web review page like a more intelligent RSS reader, and any unmatched stories would be salient (potentially broken on the air)

Turns out there is a dedicated Python package for extracting news stories from [I presume 3,000] different news sites https://pypi.org/project/newspaper3k/

Used in this simple extractor https://github.com/FusionRico/news_finder/blob/master/code/main.py which seems to poll the UK news sites listed here: https://blog.feedspot.com/uk_news_rss_feeds/

lmmx commented 3 years ago

As a smaller test, attempt to match stories in the 'news and papers' show (only 7 minutes long)