mediacloud / rss-fetcher

Intelligently fetch lists of URLs from a large collection of RSS Feeds as part of the Media Cloud Directory.
https://search.mediacloud.org/directory
Apache License 2.0
5 stars 5 forks source link

remove non-resolving URLs #3

Closed rahulbot closed 1 year ago

rahulbot commented 2 years ago

We should make an effort to remove URLs that aren't useful ones before storing them. For instance, we should remove relative URLs.

rahulbot commented 1 year ago

I was in this code anyway so I fixed this.

philbudne commented 1 year ago

Funny, I had assumed this was about feed URLs, not article URLs!!