syndicated-media / sn-spec

50 stars 3 forks source link

Podcast index, open subscriptions #56

Open inorganik opened 7 years ago

inorganik commented 7 years ago

We've discussed this in the slack group, one major challenge of this effort is indexing all podcasts. Some may disagree but I see an index as inherently central, and singular. I don't believe a reliable technology exists yet that is stable and accessible enough to decentralize an index.

To that end, could syndicated media create a non-profit, board-controlled organization that kept a central index? It would manage the submission, storage, and indexing of podcast feeds. It would provide an api as well as a front-end web interface for searching and adding podcast info.

From here it's a slippery slope to providing all the trappings of a podcast client - a player, login with subscriptions, etc. We could (A) avoid that altogether, or (B) allow it, but make it open so that any podcast client could allow a user to OAuth with this service and get thier subscriptions. The other benefit of subscriptions is having the data necessary to rank search results.

Hosting would likely be expensive due to the volume of data and api calls, so sponsors would be required to prop up the service.

Cj-Malone commented 7 years ago

If decentralized is not practical I'd prefer federated. Not to lock services in. Just provide the data openly and let them host it.

markusahlstrand commented 7 years ago

This would solve a lot of the challenges we're facing all at once. In my perspective it relates both to #23 and #34 ?

Another discussion linked this topic is how to handle copyright. It currently doesn't seem to be clear exactly how feeds can be redistributed. Maybe we could clear this up as part of signing up for this global directory?

I would also prefer going with a federated solution so that we're not just building a new itunes directory.

kookster commented 7 years ago

This is great. Once an index is created and hosted, user logins for maintaining subscriptions, last-fm like listening data, players, saving favorites, and all kinds of other things could be built on top of it - not sure they have to be built into it though.

Is this a central DB and API that all client app would actively call per user, or limited to a source of updates about feeds for those who host their own DB of podcasts, so and then apps would connect to this synched repos.

I was thinking the latter, as I don't see how a single db could be all things to all apps.

hellosteadman commented 7 years ago

@kookster I like the latter. If there's a standard to work from, it would hopefully be a matter of apps following the chain of directories. Presumably there has to be a starting point, but rather than that be a list of all podcasts, perhaps it could be the list of directories (a directory could be a podcast network, a host, or a single independent podcast) that an app can consume, or search, provided they're all using the same standard.

inorganik commented 7 years ago

I was proposing an index - meaning a place where you can search for and find a feed, and meta data about that feed. This just points you to outside resources, it doesn't host anything, except for the meta data. I'm not clear what "a source of updates" or a "chain of directories" actually looks like.

kookster commented 7 years ago

I still meant an index in both options, more a question of it being an API for all apps to use directly, or a master list of feeds and changes intended for app specific db's to sync against or receive changes from, and potentially, for apps to post new / as yet unlisted feeds to.

empaempa commented 7 years ago

@inorganik An index is exactly what is suggested here. The podchain is a service (we can host it if no one else will) where podcasters can CRUD their RSS-links.

Distribution of index updates to all federated parties is done using blockchain technology - there are a number of reasons for this. There will be "blockchain adapters" for your database of choice. This means that your SQL/Mongo/ElasticSearch/WhatHaveYou will be updated automagically and you solve your search or whatever from there.

In the podchain-issue there's a link to some wires. Note that the last wire is for advanced users.

inorganik commented 7 years ago

@empaempa Sounds really cool. I saw the "podchain" issue as well. The reason I created this separate issue is if blockchain is not accessible and stable enough. Also because as I mentioned, if we want to have open subscriptions, where a users subscriptions would be accessible to any podcast client via OAuth and the API. Is this also something podchain would take care of?

Also I'm curious how search would work. How would search ranking work? I'm still new to blockchain, I need to read up on it.

empaempa commented 7 years ago

I wouldn't worry too much about the blockchain technology - first and foremost it will be used to keep the federated parties databases in sync (yes, there are other technologies to solve this specific problem).

The upside with a blockchain is its security, ownership (no one else than the podcaster can modify its links), inspectable and decentralized features.

I think a subscription service is an interesting idea but let's solve the global index first and then discuss the best way for a global subscription. Potentially another blockchain is the answer... :)

If you haven't, please read up on blockchains - it's a very elegant technology!

inorganik commented 7 years ago

@empaempa what about search? I think that's the primary function of an index.

cqr commented 7 years ago

@inorganik I think I would prefer that you have complete public access to the database of RSS feeds and let individual organizations develop and refine their own indexes. Search indexing is a necessarily political act, and there are winners and losers depending on the algorithm and approach taken. You see this all the time. By having a central organization with the canonical search index, you lean into that. I like the idea of having a multitude of options with their own flavors.

empaempa commented 7 years ago

@inorganik @chrisrhoden Agree with Chris, the first problem that needs to be solved is to have ALL RSS-links in one place - decentralised, publicly inspectable, secure and from which federated partners get updates on additions and modifications so they can update their own databases, search and recommendations. This is the aim of the podchain.

On that note, I will meet with Chromaway on the 15th of February to kick-off the blockchain-side of the project. Most likely this will be my main project during the spring. I'll put everything in a public repo somewhere, maybe under SM? You're all welcome to pitch in :)

ziggythehamster commented 7 years ago

This sounds like the very-soon-to-be-defunct Open Directory Project, but for podcasts. It would be a huge project to maintain a podcast directory, and unless you can get Apple, Google, Podcast Addict, and a handful of other client developers on board, it won't receive the needed traction. I don't think that Apple or Google would be on board because they have different goals for their store pages than we would for a directory. If someone in the group is from Google/Apple, maybe we can learn more about your goals and see about integrating them.

That said, I think the preferred way to do this (using existing technology) would be to maintain the directory in RDF as a GitHub site. Use a well-known filename to identify the categories (i.e., the tree root) and where to fetch information about them, and then clients can display a tree view or other hierarchical navigation widget. Since it's an open GitHub repo, adding a new podcast to a category is a matter of making a simple pull request. Since we would likely want a HTML-based directory as well, we could have an after commit hook generate a HTML file and add it to the repo.

I also like the blockchain idea, but if technologically the blockchain is the right idea, we should consider ignoring this ticket for v1.

inorganik commented 7 years ago

@ziggythehamster cool idea. Are there any precedents for using RDF for big collections of data? I can't say I've heard of it.

ziggythehamster commented 7 years ago

Yes. The Open Directory Project internally uses RDF, RSS 1.0 is itself based on RDF, and it's used by OCLC to catalog all of the books in all of the world's libraries. For instance, here's the OCLC catalog entry for Amazon Web Services in Action in RDF format.

ziggythehamster commented 7 years ago

RDF also has some possibly easier to grok variations, like Turtle. RDF is both the XML representation and the idea/nomenclature.

empaempa commented 7 years ago

You're correct in that we need a decent amount of partners to keep it relevant. I do think there are organisations out there willing to put in (a very small (as in installing a docker image)) effort to become a maintainer. We'll see.

Putting the data on Github makes the podcast-ecosystem centralized and dependent on Github. The very first and most important idea of the Podchain is to make a decentralized index that has no dependency on any one player.

Pull requests would need to be vetted by a person, I guess, so no non-podcast-links end up in the index. I wouldn't want to be that person. Hopefully the index at one point will contain all the hundreds of thousands of podcasts out there.

Having it on Github would also require non-tech podcast publishers to sign up on Github and make a pull request if they want to change or delete their link from the index.

And just so we're all aware: right now nothing but the links to podcasts will be stored in the podchain - no metadata, except possibly the podcast title and the email to the publisher. It's not a search database, it doesn't deal with categories or any other segmentation. That's up to the individual services to handle.

We'll have a working demo running at the meetup in Boston. I hope Markus also will be able to show what is required to install a podchain node.

inorganik commented 7 years ago

In the demo, will you show how the collection gets distributed to all the peers? I watched the blockchain demo video, and it's very informative, I'm just unclear about how the info gets distributed. Also is there a link for the meetup you're referring to?

I guess to sum up what you're proposing @empaempa, in regards to this issue/thread, is that there should not be a central index or open subscriptions. An index would be maintained by each peer of the podchain.

It makes sense in a lot of ways, because maintaining an index, and creating a search engine for that index, which is inherently biased, should depend on a lot more than just data in podcast feeds (e.g., subscriptions, etc). And it also takes a lot of time and resources. Which, in the hypothetical scenario of a "non-profit, board-controlled" organization, all of that would fall on their shoulders, creating a financial strain that would need to be covered by donors. People would need to be hired to maintain it; it gets pretty complicated from there.

But the alternative @empaempa suggests seems very promising. I would just like to learn how it actually gets implemented.

empaempa commented 7 years ago

There's a Syndicated Media event in Boston in May, 10-12 I think (anyone who knows the details?). Our goal is to have a working demo with several running nodes showing how the data is distributed and what not. It's only a proof of concept showing how you add, update and remove links in the index.

Hopefully we can demo, too, how to install a node and what configuration is needed.

inorganik commented 7 years ago

Thanks. You should definitely take a video of the presentation for those of us who can't make it. Excited to see this!

empaempa commented 7 years ago

Good idea!

empaempa commented 7 years ago

@inorganik https://www.syndicated.media/2017/Symposium/