livepeer / catalyst

Livepeer's Decentralized Media Server
MIT License
21 stars 13 forks source link

IPFS support #154

Open cyberj0g opened 1 year ago

cyberj0g commented 1 year ago

General considerations

victorges commented 1 year ago

Agree with almost everything! Some comments:

To ensure the cloud provider didn't tamper with the contents of the file, we probably want to add an extra step of calculating the content hash locally and comparing it with the address, returned after upload.

I don't think this is a super strict requirement. We upload things to Google S3 and don't re-download the file to check that it is actually the file we uploaded. Piñata is a much smaller provider, but if we are building our service on top of them we should probably trust them as a service provider.

We could still do the pre-calculation of the CID for other reasons though, like giving a CID to every asset even if it's not saved on IPFS which can allow for some more homogenous use of the CIDs as identifiers. I still wouldn't put this as a requirement for this first integration though.

VOD Input

VOD output

ipfs:// URL support should be added to https://github.com/livepeer/go-tools/issues/3 and catalyst-uploader.

I don't think catalyst-uploader needs to support ipfs:// URLs. go-tools might in theory, but in practice it won't really be a requirement since it will only be uploading files, not downloading them (especially if we only download them with gateway URLs).

Here I just want to make a clear distinction that a "livepeer-defined URL to represent an IPFS pinning service as an Object Store" should not use the ipfs:// scheme but anything else like pinata://. That since ipfs:// is a part of the official IPFS protocol, used to reference and read files through their content-hash, so we should not mix up the 2.

So IMO go-tools and catalyst-uploader will need support for an "IPFS-based Object Store", but they won't really need support for ipfs:// URLs.

It will use Pinata API to pin the file and return Pinata IPFS gateway URL.

It will be more useful to get the IPFS CID or ipfs:// URL back, so we don't need to parse gateway URLs to get the CID. On the playlists it does make sense to use a gateway URL tho for them to be supported in regular browsers. If we need to return the gateway URLs to Studio it's not a huge issue either, it's OK to parse the gateway URLs. Just a soft prefence to get the raw IPFS URL or CID instead.

Btw we also have our own branded IPFS gateway through Piñata, under ipfs.livepeer.com (and maybe .studio as well, not sure). But that will only work through our "built-in" object store, so not necessarily we should use that for every file saved on IPFS, but also not sure how do we pass that to catalyst somehow. Maybe we could use the Object Store hostname for the host that should be used for the gateway? Feels a little weird, but would be like pinata://key:pwd@ipfs.livepeer.com or if we just wanna use Cloudflare's gateway pinata://key:pwd@cloudflare-ipfs.com.

TBD: what should be the playlist behavior? It seems that IPNS is not available on Pinata. It seem to support directory wrapping, with pretty obscure API, which may allow to address the file by name.

For VOD I think it's fine if all the files aren't in the same IPFS directory. Can just have a playlist file pointing to other independent files on IPFS as well, and it's possible cause we can just store the playlist file after everything else.

Much trickier for livestreams indeed, but I'd argue that it doesn't make sense for Livestreams anyway if it is a "content address" that both is not permanent and changes all the time. Might as well have dynamic playlists on that case and only the segments stored on IPFS (if we ever do want IPFS-based playback).

I'd also say not to spend a lot of time on this. IPFS playback is not practical right now, and even tho they are getting better we should focus on what works today. So IMO starting with only the original "MP4" files on IPFS is enough (and that's all that we have on Studio today as well, apart from NFT metadata we won't be handled by Catalyst anyway).

victorges commented 1 year ago

More concrete examples for input/output:

Also a side note, we might need to rethink this outputs schema, since there's no reference there about what output_location from the request it refers to. If we have multiple object_store outputs then they're indistinguishable there. Could be just including the original outuput_location URL in there, tho that disallows multiple exports to the same OS, which we might need. Perhaps having the contract of always listing them in the same order? Not sure.

cyberj0g commented 1 year ago

Thanks for very useful input @victorges. Finally, I got some idea how it should work on catalyst-api side. It makes sense to focus on single-file VOD first, and, if later we will need to implement HLS and live streaming, we already have initial research documented here.

Let's return both CID and full gateway URL from catalyst-api, and, maybe, only CID from go-tools to not suggest a specific gateway.

On naming, folder wrapping should work fine for immutable content, to have the gateway URL ending with file name, I'll implement that.

if we are building our service on top of them we should probably trust them as a service provider

You are probably right, we can trust the provider at this stage. However, I believe, that the ultimate goal is to provide a fully verifiable trustless flow for users who need that. Also, when low latency streaming is implemented on B-O-T, we'll open the path to per-video-packet verification. It will likely require streaming verification on the storage side as well. Maybe @yondonfu could chime in on that.

yondonfu commented 1 year ago

Chiming in here.

I'd also say not to spend a lot of time on this. IPFS playback is not practical right now, and even tho they are getting better we should focus on what works today. So IMO starting with only the original "MP4" files on IPFS is enough (and that's all that we have on Studio today as well, apart from NFT metadata we won't be handled by Catalyst anyway).

Focusing on using IPFS to persist source mp4 assets to match the status quo functionality in Studio first makes sense to me. As long as we have access to the source assets, we can always generate derived assets as is needed (i.e. a source HLS playlist, transcoded renditions, etc.).

You are probably right, we can trust the provider at this stage. However, I believe, that the ultimate goal is to provide a fully verifiable trustless flow for users who need that. Also, when low latency streaming is implemented on B-O-T, we'll open the path to per-video-packet verification. It will likely require streaming verification on the storage side as well.

In this case, I see two trust relationships:

  1. The trust relationship b/w Catalyst and the IPFS gateway provider
  2. The trust relationship b/w Catalyst and its user

For 1, in the short term, we should be able to trust reputable gateway providers. Later on, we may want to have more flexibility with using gateway providers where some of the providers are not trusted in which case we could look into verifiable retrieval from gateways in the Catalyst integration.

For 2, I think we can address this with the verifiable video/RMID work that we've been investigating - the basic idea being that the user calculates a unique hash ID for the raw media (i.e. video, audio, metadata tracks + relative timestamps) of an asset agnostic to the container, checks that this ID matches the one calculated by Studio/Catalyst and uses the ID to check the content returned for a request with the ability to check that the raw media is correct even if the response is a transmuxed version. And for the case where a transcoded rendition is returned there would be a signed attestation. This is being fleshed out for Q4!