scripting / Scripting-News

I'm starting to use GitHub for work on my blog. Why not? It's got good communication and collaboration tools. Why not hook it up to a blog?
121 stars 10 forks source link

A maintained list of news feeds #131

Open scripting opened 5 years ago

scripting commented 5 years ago

I've been posting about the need for a maintained list of stable and usable RSS feeds from news orgs. There is some activity out there, and a discussion is possible, but I can't be in the center of it. I'm willing to facilitate though, so here's a place to post instead of emailing directly to me.

Please be on-topic. No debates.

I have many applications for it. Think of me as a user.

lemeb commented 5 years ago

Thanks for writing that up. A few remarks:

scripting commented 5 years ago

I'm glad you're doing what you're doing, and I hear good things about it, but as a user, I've tried to describe what I, and I think many other people, need.

We're all kind of fumbling in the dark trying to find stuff worth following. I want to be systematic about it. A process.

Media Cloud seems like something much more comprehensive. I want what I described.

andysylvester commented 5 years ago

Take a look at this, let me know what you think:

https://mediafeedproject.org/

https://github.com/mediafeedproject/mediafeedproject

mterenzio commented 5 years ago

@lemeb Does Media Cloud provide a dump of the RSS feeds it does currently use? You say it's hard to maintain. A project like this would get the community involved in maintaining them. It wouldn't be extra work on your part and it might even help you. If it doesn't make the RSS feed dumps available, can you answer why? That wouldn't seem to violate any copyright issues.

scripting commented 5 years ago

@mterenzio -- it seems like we're not going to hear from them. I did write them a follow-up email suggesting we explore working on this together, but haven't heard back.

I plan to loop back around to this again and again, as I ship new software I'm working on, it'll become the #1 thing on my list again.

The key is the process, and association with organizations that are long-lived and high-reputation.

mterenzio commented 5 years ago

@scripting 100% agree. It's a vital resource that isn't available. For web news, this is as important as archive.org is for web history. I'll try a few avenues myself and let you know if I make any progress.

anothercookiecrumbles commented 5 years ago

Thanks for putting this together. Some comments from me, a research fellow at the Tow Center over at Columbia Journalism School.

As part of a different project, we've been whitelisting news organisations and have an automated process that checks their RSS feeds, including whether new ones have been added or old ones removed. We have over 700+ legitimate US local / national news organisations as well as a few others like The Guardian and The Financial Times. For some news organisations, we've failed to find any RSS feed, which in itself seems lamentable. Overall, as of now, we have about ~2000 feeds for these ~700 news organisations.

We want to eventually open-source all our code + provide API access, but because the project's in its nascent stage with plenty of moving parts, we've not done so yet. Lest assured, it's high on our list of things to do.

I am happy to share our RSS feeds or a regular basis (a database dump or something once a month? more frequently?), and ensure we're maintaining the quality of our list.

mterenzio commented 5 years ago

@anothercookiecrumbles I'm interested and I'd like to learn more about the project

donpark commented 5 years ago

I think everyone should share a feed of their 'trust worthy' news sources. Subscribing to a 'source' feed means a) I trust the feed owner's judgements and b) it's added to my feed of trusted news sources. Decentralized House of Cards made out of Turtles all the way down. :-)

scripting commented 5 years ago

@anothercookiecrumbles -- bingo! that's exactly what I was looking for.

I think the way to go is to periodically, ideally daily, a script runs, pulls out the feeds, along with any useful metadata you have, formats it as an OPML subscription list and uploads it to a GitHub repo. From there, people can deploy the feeds in any number of different applications.

I have JavaScript code that does all that, and am happy to help. The key thing here is an authoritative list of feeds, and I can't imagine a better authority than Tow Center.

Thanks for getting in touch.

anothercookiecrumbles commented 5 years ago

@mterenzio, we've got a bunch of efforts around local news, and the news outlet whitelisting/RSS feeds curation is part of a more data-intensive component to the larger project. We're still fine-tuning the research questions and the shape the project will take, but happy to go into more detail if you're curious.

@scripting, is the JavaScript code open-source? Alternatively, do you know of any Python libraries/repos that does that? If so, I can try to sort something out soon-ish based on the data we have.

The one thing worth pointing out: I think we'll struggle to have an authoritative list of feeds, mostly because whitelisting means we'll inevitably be missing out some news organisations inadvertently. And, even now, the stuff we have is predominantly English, which means we're not capturing a ton of stuff (I think we only have a handful of Spanish sites, and nothing in Chinese or Bengali, for example). This is something we're aware of and looking to address, but it's worth flagging upfront. We need to be able to crowdsource, if nothing else, legitimate lists of news organisations, and things like INN and LION get us some way there, but what about beyond that?

scripting commented 5 years ago

@anothercookiecrumbles -- yes all of it is open source, but it'll work better if i do the adaptation and make that open source.

Re struggle -- 1. do the best you can now, and 2. try to do better in the future. Software is a process. It sucks today but it'll suck less tomorrow. That's my philosophy. Here the real accomplishment is to flow the good work you're doing in academia into the RSS community, such as it is (we'll find out) in a useful way. And learn from that, and help each other with the next steps.

So let's go back to step 1. Is there a format you can make available through a (possibly private) API that would get me a list of your feeds and the metadata, in any format you find easy to produce. From that, I can take care of porting to OPML and uploading to GitHub on a daily basis.

I already do that for my blog, in the repo we're using right now. Look in the blog section at the top level. That is updated every night as I post new stuff at scripting.com. I would more or less model the interface on what I learned from that (and the working code that does the uploads).

anothercookiecrumbles commented 5 years ago

Sorry for the late reply. I think what might be easiest (and quickest) for me is to write a script that uploads a CSV or something (JSON, XML, whatever) to a GitHub repo, and you can pull it from there?

scripting commented 5 years ago

@anothercookiecrumbles -- no worries, this is a very asynchronous thread. ;-)

That would work. Whatever format works best for you.

PMaynard commented 5 years ago

@scripting I like the idea. I've tried to operate something similar for a few years. The news is focused on information security, with an industrial control systems bent (Since that's my research topic).

I've added an opml subscription list [1] to my news aggregator[2]. I have been meaning to prune and add more quality feeds. Somethings like the Reddit feed does raise the overall signal to noise ratio, and are not what you'd want to include.

[1] http://port22.co.uk/port22_feeds.opml [2] https://port22.co.uk/