syndicated-media / sn-spec

50 stars 3 forks source link

Podchain #34

Open empaempa opened 7 years ago

empaempa commented 7 years ago

We have tinkered with an idea aimed at solving a central RSS-library problem: it's pretty much impossible to get hold of all podcasts out there. iTunes probably sits on about 95% of everything but we (the other podcast players) really can't access all that.

The basic idea is to put all RSS-links in a blockchain. If someone adds an RSS-link to our servers, your servers will be updated, too. We're in early talks with Chromaway.com, who is building a blockchain based product to keep databases in sync.

Why a blockchain? It's a buzz word, it will in itself create some interest. More importantly, the publisher will own their blocks and be able to modify them. And it will be publicly inspectable, adding to the openness we all want.

These are very early thoughts and maybe there's a better way of doing this - just wanted to put it out to get the discussion going :)

empaempa commented 7 years ago

Also, wherever the service lives, it can validate the feeds and warn for missing/misused fields, including the ones we're discussing here.

empaempa commented 7 years ago

Chromaway just told me their product is aimed for release early 2017 but I'm trying to get in early to see if it can be used.

BTW: I'm by no means a blockchain tech expert so if anyone out there is, please let me know!

Cj-Malone commented 7 years ago

I don't know anything about Chromaway, and very little about blockchains. But I don't think you can validate something before another node adds?

And what's the idea for Podcasters, they have to keep a the signing key for updates? All those keys will be lost.

BTW: iTunes is very easy to scrape, and has an open api for searching.

empaempa commented 7 years ago

We imagine an open-sourced service (living on podchain.org) where a publisher logs in and add their feed. The service validates the feed, give warnings and suggest missing fields (the ones we discuss here) and then adds the feed URL to the blockchain. The service would also offer to store the access keys for the publisher (but ofcourse let them choose not to and offer an download option).

As this service is open-sourced, anyone can install and run it as they see fit (we would probably also run it under the acast.com domain, somewhere). The blockchain technology from Chromaway (or other supplier) keeps all databases in synk, so we all are updated.

Scraping iTunes is not allowed or is limited and we like to tweak and improve our search as we see fit. Also, I think it's better to have all RSS-links publicly available than rely on Apple.

empaempa commented 7 years ago

Have spent a week to deep dive into blockchain technologies (you all should, it's a very elegant thought). http://etherium.org is the best bet if we should use an open-sourced solution but it seems to be quite in flux at the moment.

I met the Chromaway-people at a bitcoin-meetup the other day and they'll grant us early access to their sync-db-using-blockchain-tech - hopefully this side of the year.

This is a wireframe of how I imagine the basic functionality of podchain.org

https://projects.invisionapp.com/share/YC9D2QL3S#/screens

Feedback appreciated!

cooganb commented 7 years ago

@empaempa very interested in helping with this / doing any grunt work research. Any developments since November?

rustyshelf commented 7 years ago

I like the idea of having a centralised place for an author to submit to, and apps to retrieve from, but personally I think the most complexity an author should be presented with is pasting a feed URL into a website. Blockchains require knowledge of private keys, validation nodes, etc, etc which could maybe be overkill for what we need?

I think what we want to aim for is:

There is also the issue that most apps like ours will want to keep supporting a user just being able to paste in a feed URL to a feed they can't find in that database, so there's the tricky issue of what to do when the author hasn't submitted their feed already.

inorganik commented 7 years ago

Link above is misspelled ^^^ https://www.ethereum.org/

empaempa commented 7 years ago

@cooganb No development has happened, the idea is still something we like to pursue at Acast and most likely will start working once I get back to work again (been away for a while, hence the late reply).

@rustyshelf Totally agree, please have a look at the wires l put together:

https://projects.invisionapp.com/share/YC9D2QL3S#/screens

...the last wire is if you want to download your keys.

The idea is to create a service at podchain.org where you manage your RSS link. The reasoning behind using a blockchain to propagate data to all interested are many - it's decentralized, it's secure, it's publicly inspectable etc. Please read up on blockchains, it sure is a fascinating technology!

empaempa commented 7 years ago

This is a fantastic video about the basis of blockchains - give it 20 minutes, you won't regret a second:

https://anders.com/blockchain/

evoterra commented 7 years ago

I watched the video and now can better appreciate why blockchain makes or an excellent way to store the lookup tables for public podcast feeds. Thanks for sharing that.

I also looked over the wireframes, and one thing struck me: it's too much. There's really no need for the creator to "own" the listing. An RSS feed is either public or not available, so anyone should be able to add feeds. If there's a problem with the feed, display that information anytime someone asks (or tries to submit). If you want to send suggested fixes/inclusions, cool. Do that on screen, and give the option to email the feed owner (via the address in the feed). Any "fixes" must be done on the feed anyhow, so perhaps the project moves quicker if user-to-podcast registration is taken out?

(Please note: I manage some 700+ feeds, so I'm more than just a casual user. Keep that in mind when you berate me for my oversimplification of the issue!)

empaempa commented 7 years ago

@evoterra Good point, reads and additions (CR in CRUD) of RSS-links should be open to anyone.

However, it's important that Updates (redirects) and Deletes (UD in CRUD) only can be made by the owner. I've been thinking to use the email in the feed to identify the owner. So, when a new link is added, a claim-your-block-mail is sent to the feed email with instructions on how to claim ownership of the block. At this point, I think the wires reflect a pretty good flow.

I have a kickoff meeting with Chromaway on the 15th of February between 10-12 Swedish time. If anyone is interested, I can setup a hangout so you can join. The goal is to design the blockchain-part of the puzzle.

The UI and server part will most likely be Node+React to keep things simple.

evoterra commented 7 years ago

@empaempa I had to look up the meaning of CRUD. :) All of which are already built into the existing RSS spec (or at least current implementation). Specific to the points you raise:

empaempa commented 7 years ago

Yes, true they are included but I think it should be possible to register a redirect in the podchain, meaning that the federate partners will go directly against the new link instead of using the built in. This will also help everyone to keep track of how things change over time.

For deletes, same thing, I think you should be able to remove your feed from the podchain and the federated partners' databases.

These two actions will happen VERY seldom so I don't think it's a super hard UX-problem to solve. And, naturally, you don't have to update/delete anything in the podchain if you don't like. But if you do, there'll be benefits like not having to keep the old link alive and have it removed from all apps at once.

theDanielJLewis commented 7 years ago

Talking about updating the URLs, I think it's more important to appropriately follow and update based on 30x redirects. The information about the podcast should be pulled entirely from the feed contents.

I think the only thing a podcaster would have to manage is allowing or blocking their podcast (either globally or individual destinations or destination types).

timpritlove commented 7 years ago

A blockchain only makes sense if there is a sustained interest in its survival that is shared by most of its participants. It requires active participation and foremost contribution in terms of computing power on a constant basis. If this does not happen a blockchain is vulnerable to takeovers and provides no security for the information stored.

Podcasters struggling keeping up their own sites and feeds and who are usually not very technical are not a good choice to keep such a critical infrastructure alive when the only service provided is a mere listing of an URL. Why should I be running dedicated software on machines facing the public internet just to keep storage of a single information alive while most people and apps turn to big established databases anyway? Also, people will lose their "wallets" losing the ability to change their entries later on.

Given the fact that control over your domain (or the domain of your podcast hoster) is enough to provide proper validation a blockchain might not present enough added benefit here. I see the point in having an open directory but I am not convinced a dedicated blockchain is the solution to this.

lexicon2600 commented 7 years ago

That makes sense. If the majority of podcasters were serious engineers, that might be one thing, but we're not. Most podcasters are hobbyists in both media and technology.

empaempa commented 7 years ago

@timpritlove @lexicon2600 It comes down to the type of blockchain used. The idea is not to use a bitcoin-type of blockchain with miners and proof-of-work, but to use a consortium-private type of blockchain. I'm soon done with a project description where all this is explained. Stay tuned!

GluedToTheScreen commented 7 years ago

Looking forward to details as still not clear in my mind. Big picture, though, @empaempa:

Your podchain vision references a collection of podcast CHANNELS (only), their associated attributes, and how to manage them... but not ITEMS (episodes) or their details.

Is that right or not?

empaempa commented 7 years ago

That is correct. No episodes or content will be stored in the podchain, only URLs to the podcast feeds. Maybe we will store its name, too, and possibly a rating (mature etc) but no metadata. This is to keep things as simple and open as possible.

theDanielJLewis commented 7 years ago

I thought I understand this concept, until I watched the videos. :/

So here are the two questions I think must be answered in order to validate this (or most any other) idea:

  1. What is the benefit to the podcast-consumer (podsumer, as I like to call them)?
  2. What is the benefit to the podcaster?
empaempa commented 7 years ago

This doc sure doesn't answer all the detail questions but I hope this document answers most general questions:

https://docs.google.com/a/acast.com/document/d/1HgN2gxmXQmyTvRKZ-wMxntWiy-baijVAi9S6l_j0V6E/edit?usp=sharing

The exakt details on private and public keys and the way they're going to be generated, stored, revoked and what not is something Chromaway will help us figure out. It's very doable.

GluedToTheScreen commented 7 years ago

the write up helps, thanks. A chain of podcast RSS feed links will be collected, each owner can claim their feed (or "link") in the chain... and update it upon confirmation by managing nodes in the chain.

How is podfade handled?

If the owner updates, fine... but fading is sometimes disappearing. With or without good media left behind.

How does an abandoned feed get updated? (assuming someone learns about it and wants to correct the data)

empaempa commented 7 years ago

Please elaborate on what you mean by "podfade" (I think I understand but not 99% sure). Give some examples.

EDIT: Found this, haha: http://www.urbandictionary.com/define.php?term=podfade I should have heard the phrase before but honestly it was a first :)

I don't think it's the podchain's responsibility to handle this and I'm not sure it's even possible given that the keys to the blocks are in the hands of the podcast publisher. There might be ways to flag blocks as faded...

GluedToTheScreen commented 7 years ago

Yes, @empaempa, key access was my point. I guess I have been looking at the podchain as an attempt to have an independent and authoritative source that somehow also maintained the current status... but I do see that is NOT in your spec overview.

SO, the podchain is a list of all podcasts... active and "retired"... the URL of which will be as current as the last time the owner updated it. We really know nothing else*.

*I did see reference to "no content or metadata" but also saw conflicting inclusion of name/title and explicit rating (which are metadata). Either way, the actual STATUS of the feed is unknown via the podchain, right?

empaempa commented 7 years ago

Title is just for being able to search the podchain a little bit easier than using the full URL, but it's not needed for it to function. Rating is just an idea, to flag weird content (like hate speech and rasism). But it's discussable, it's really not the podchain's responsibility, it's something each podcast services should handle.

I think building in features like status (active/faded) and rating is risky business as it turns a fully open library into something opinionated.

If we decide to add these features, which I think is doable, there needs to be a lot of discussion on its exact implementation. Updates to the Podchain is, at the end, something the consortium nodes decide on - no matter who sits on the keys. I think, at least :)

GluedToTheScreen commented 7 years ago

Thanks for expanding on that... and I'm not necessarily suggesting the podchain should contain the additional info, it was just mentioned previously.

I'm just trying to understand the "value proposition" of creating and maintaining the podchain... to see WHO gets WHAT out of it.

empaempa commented 7 years ago

From the doc: "Listeners will have more content, platforms and apps will have more data and publishers will submit their podcast once and be available everywhere." In my eyes it's a win-win-win.

Keeping it alive requires very little compute power and it's not an either-or-solution, it can live in parallel with current solutions.

GluedToTheScreen commented 7 years ago

If non-exclusive and min. footprint, why not? As a developer focused on organizing and presenting RSS data downstream (after publication), it becomes another source of hopefully complete and accurate feed data.

  1. Listeners will only benefit indirectly... this is not something they will use.
  2. Publishers will ultimately benefit by exposure via an independent data warehouse... a single, reliable, public place where their current feed URL can be found. This is more valuable for a new podcast (achieve wider, faster distribution) than a podcast that is already established.
  3. Developers will have an authoritative source to obtain a complete list of podcast feed URLs.

I can tell you #3 would have saved me a significant amount of ramp up time/effort. A startup might find this VERY attractive "seed data". It appears to me that, initially, devs are the primary beneficiaries of this effort.

IF an developer already HAS a database, though, then the VALUE ADD is reduced to (a) filling in gaps in their database, (b) maybe knowing when URLs change, and (c) learning about new ones when they come online. I'm not sure how that will work but, presumably, I can monitor the blockchain OR get notified when it has changed, and therefore update my app automatically... in near-real-time. Sounds good to me.

Tell me where this is wrong or what I'm not understanding correctly.

empaempa commented 7 years ago

You're right on the money. Personally I also think it's important to have a shared library that isn't owned by one single, private company. After all, podcast is an open eco-system and we should work towards making sure it stays that way.

For already established podcasts it still can be beneficial. We can see that many apps out there doesn't have all our and other Swedish content available. I doubt there are many apps out there with more than 25% of all podcasts in their search. If we can get this together and get close to a 100% library, it can be huge win for all three parties - listeners, podcasters and platforms/apps.

GluedToTheScreen commented 7 years ago

Which begs the question of how the chain gets populated initially...

slurp from itunes API?

And with whatever start, how to backfill to that 100% target (which is the point of this endeavor)?

cooganb commented 7 years ago

Along with the question of initial-population, could someone talk a little more about the consortium? If I'm understanding this correctly, it will be the backbone of Podchain...any more thoughts about how this will be sustainable?

empaempa commented 7 years ago

@GluedToTheScreen Scrape iTunes is a start and we, at Acast, will put in everything we have. I hope other companies and organisations will help out, too. Personally I see this as a long term project. Maybe we reach 100% in a couple of years.

@cooganb As written in the document, I'd like to see that the consortium partners are a legal entity with vested interest in the podcast industry - but that's totally open for discussion.

Actually, we at Acast can keep this alive by ourselves and others simply benefit from the openness of the technology, but I do think there's some other organisations out there willing to help out. In the end, all you have to put in is a small server running a docker image :)

geeknews commented 7 years ago

No single company should control the blockchain. The organization is going to need to find a way to insure it is 100% independent. The best way is to make syndicated.media a non-profit and supported by the participating podcast companies. There should be no way any company is seen to profit or gain influence from a group effort.

Building the list is pretty elementary, keeping it maintained with active and podfaded shows is another issue and coming up with those rules is going to be needed.

empaempa commented 7 years ago

Absolutely, it's a consortium that controls it. I'm not sure any central, legal entity is actually needed (read: SM as a non-profit). I have a meeting with Chromaway tomorrow morning and one question is what identifies a consortium node. Most likely it comes down to a certificate and how these are generated is something to debate - can any partner generate a new and give to a new partner, must a majority of the partners agree etc.

cooganb commented 7 years ago

Great, looking forward to hearing what comes of that conversation. I'm interested in what the tech specs for a node like this would be.

It's sounding a bit like the consortium will mimic a large professional body, with members having similar interests in a comprehensive, up to date ledger? On Mon, Feb 20, 2017 at 3:50 PM Mikael Emtinger notifications@github.com wrote:

Absolutely, it's a consortium that controls it. I'm not sure any central, legal entity is actually needed (read: SM as a non-profit). I have a meeting with Chromaway tomorrow morning and one question is what identifies a consortium node. Most likely it comes down to a certificate and how these are generated is something to debate - can any partner generate a new and give to a new partner, must a majority of the partners agree etc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/syndicated-media/sn-spec/issues/34#issuecomment-281178373, or mute the thread https://github.com/notifications/unsubscribe-auth/AHxGKcvb4YlVp8KoPr1nVFhETzWu89pZks5refyNgaJpZM4KhRpX .

empaempa commented 7 years ago

Just had a great, two hour startup meeting with Chromaway. The consortium network is basically a list of public keys. Let's say I have key "ABC123", then the other nodes are configured to trust any message sent to them signed with key "ABC123". To become a partner of the consortium network all partners need to add the new partner's key to their list of trusted keys.

Note that you don't have to be part of the consortium to run a "listening node".

Chromaway's blockchain product is called Postchain (too similar to podchain!) and runs on NodeJS and Postgresql. We'll have to write a podcast-feed validator plugin for it, so no URLs that aren't podcasts are added.

You will be able to configure a webhook to listen for changes to the podchain and an API to submit new podcast feeds. We're working out the basics right now and will present our ideas along the way - all open to suggestions and improvements.

Our goal is to present a proof of concept in May.

geeknews commented 7 years ago

I'm going to be honest here. I really do not understand why this has to be done in blockchain.. Is not OPML much more simple and well supported and not reliant on a third party.

Syndicated.Media has not yet addressed where this will reside, and if they will move forward and making syndicated media a non-profit. I have real issue with the ownership issues surrounding this whole project.

farski commented 7 years ago

@geeknews The group has addressed this several times, on the monthly calls and on Slack. So far there has not yet been a strong desire to make that happen, and the majority of folks involved don't feel it's necessary to start enacting the changes that are being discussed. It's also been stated that this will be a main topic at the symposium being held in Boston. If it's something you feel strongly about, and want to make some headway on it before then, would you be up for leading the initiative?

empaempa commented 7 years ago

@geeknews This has really nothing to do with OPML. It's a library of where all podcasts are in the world. Nobody will own it, anyone can securely download and inspect the library.

The only "ownership" in the setup is that only a consortium of organisations (and perhaps individuals) with a vested interest in the eco-system is allowed to vote on what goes into the library and what doesn't. Personally I think that's a good balance but certainly open for discussion.

empaempa commented 7 years ago

Forgot: the individual blocks that count the feed URL will be owned by the owner of the feed URL given they claim it after it been added. So ownership is where it should be.

empaempa commented 7 years ago

"Count" should have been "holds" - can't edit on mobile :/

cooganb commented 7 years ago

I get the sense this issue of "why the blockchain?" will come up repeatedly in this discussion. There are a lot of misconceptions around blockchain technology and distributed ledgers, predominantly based on unrelated crypto-currency blockchain projects like Bitcoin.

It might be a good exercise and a worthwhile tool to draw up a very simple statement about why we think a blockchain would address the issues raised in this conversation. A few of them that have stuck out to me are below:

Though basic, I do think this would help make sure everyone is on the same page. Particularly if this conversation reaches the stage of general discussion.

Do people think this is worthwhile? If so, what other questions should be answered?

On Tue, Feb 21, 2017 at 11:16 AM, Mikael Emtinger notifications@github.com wrote:

"Count" should have been "holds" - can't edit on mobile :/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/syndicated-media/sn-spec/issues/34#issuecomment-281392523, or mute the thread https://github.com/notifications/unsubscribe-auth/AHxGKSvbHHksnKd_UQnzLb_QX0Z8Vvvvks5rew34gaJpZM4KhRpX .

CharlesWiltgen commented 7 years ago

This has really nothing to do with OPML.

I think @geeknews is using OPML as an example, since it's been used for this kind of application before. For example, this is how the BBC makes its master list of podcasts available to podcast catalogs.

geeknews commented 7 years ago

We have private OPML feeds broken down by category that contain the entire 368k shows we have in our podcast directory, we have had them for 10 years. This my question.

Whatever is created has to be simple and have protections built in so folks do not submit there iTunes feed or website URL etc as what they can submit they will.

empaempa commented 7 years ago

The basic idea behind any blockchain technology is beautifully explained here: http://anders.com/blockchain. Please view the video and play with the live examples.

Blockchain is a fancy word for a database. There are some important features, though, that sets it apart from an ordinary database:

I think you can make the same comparison with OPML.

The two alternatives to proof-of-work (which the Bitcoin uses) that I've read about is Proof-of-stake and Proof-of-burn. After deep talks with blockchain nerds, I'm convinced that a consortium blockchain with a voting-for-consensus is the right choice for the Podchain. So no proof-of-work needed. There are some scary attack vectors if we use a public blockchain and proof-of-work.

A consortium with partners that have a vested interest in the podcast eco-system can afford running a node. They usually use AWS, Azure or similar cloud services that have a great up-time. That said, the blockchain technology handles that nodes go offline and come back, and keeps them in sync. There are some attack vectors, for example how trusted keys are sent between the partners. We need to discuss a method for this.

The benefits are, as state above, more content for listeners, more data for services and more control and reach - add once, be everywhere, for the publishers.

There will be a validation of everything that is submitted. Right now I'm thinking that you need to have certain fields in your feed and at least one episode that is longer than X seconds. Or you'll be rejected. The idea right now is that the validation is done during the voting process, that each node in the consortium performs this validation before it cast its vote.

Phew. Long answer! :D

CharlesWiltgen commented 7 years ago

@empaempa, that's a great overview of the "What?". What are your answers for the "Why"?

For example, one reasons blockchains exist is to deal with absence of trust (i.e. the Byzantine Generals problem). But it sounds like there will be a consortium that serves as gatekeeper, in which case a blockchain is just a difficult-to-access database.

Even blockchain gods will tell you that if you can use a relational database you absolutely should, so that's what I'm thinking of while pondering what a blockchain-based system buys over a boring-but-dead-simple download/API combo.

empaempa commented 7 years ago

I think the technical "why" is answered above - no single point of failure, data is kept in sync, data is publicly inspectable and secure, publishers own and manage their feed URL records. The data should be available through an ordinary REST API, too.

The philosophical and political whys (my personal opinions):

...I might come up with some more during the day :)

geeknews commented 7 years ago

Please quit using the word consortium to such time it is. The organization is a open working group, there is no organization or rules that would define an actual consortium.

As I said OPML accomplishes the same thing without the complexities of a bitchain. 99.9% of podcasters are going to hear that and there eyes roll back in there heads.

FWIW Apple is not the gate keeper never has been, sadly most devs have been lazy and used their API which is pure insanity. We have had a master list of shows and episodes for years. Because we invested the time and could never bank our business in the gratitude of Apple.

empaempa commented 7 years ago

I use the word consortium because that's the word the blockchain technology uses. In this domain it's not meant as a legal entity, it's just blockchain nodes that trust each other. Feel free to come up with another word and I'll start using that :)

Good for you (which company do you work for?) that you have invested in this, we at Acast have, too. But we see that on this specific issue a collaboration between organisations in the podcast eco-system would be a huge win for everyone (listeners, publishers and services) and make it more stable.

OPML in itself doesn't have a syncing mechanism, it's centralized (it depends on your service to be available), podcasters have no say about their entry in the OPML - maybe by emailing you. If they're added to your OPML they're not automagically available in other services minutes later. These are all things a blockchain solves. How we present it to the world - via a docker image, a site, an API, as a downloadable OPML-file, is all implementation details.