Closed yuvipanda closed 1 day ago
/cc @d70-t, @rabernat (and others? idk). This would be the first step of letting a binder pull in content from IPFS directly.
If a content provider is mainly about getting a folder, then maybe
ipfs get <CID> -o <target_folder>
would do the trick (here's the doc) The catch would be that you'd need to have a running ipfs node on the machine executing the ipfs get. But probably that's not too hard of a requirement 🤷 .
ipfsspec
could also be used to retrieve content via gateways on other machines, however ipfsspec
is more about getting the content into Python, not really to write it back out on disk (though I assume that should be easy to put on top).
Another option might become the upcoming ?format=car option (and probably also tar
etc...), which would make gateways support streaming out an entire graph behind a cid in stead of just single files.
I guess this would require the ipfs
binary to be present. I do think that most users of repo2docker will not have a daemon running locally, so falling back to gateways is quite important. ?format=car
sounds like a good option - do you know if it is actually being implemented right now? I can't quite tell from that issue what the status of that is.
So I think it is being developed, but its more like a refactoring step which requires quite a bit of coordination. it is scheduled for the next next release. But the functionality seems to be there already, you can use:
http(s)://<gateway>/api/v0/dag/export?arg=<CID>
for CAR export (since v.0.10, the current one)http(s)://<gateway>/api/v0/get?arg=<CID>
for TAR exportE.g. you could use
curl -v -L "https://ipfs.io/api/v0/get?arg=QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG" > quickstart.tar
to obtain the quickstart folder from the tutorial using the public gateway at ipfs.io
.
Likewise you'd obtain the same on your local gateway using:
curl -v -L "http://127.0.0.1:8080/api/v0/get?arg=QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG" > quickstart.tar
I'm just thinking about which kinds of benefits an IPFS content provider would also have.
One thing which might be interesting would be to use the CID of the binder
or .binder
folder as a cache key for built docker images. That way, it might be possible to automatically built images less often.
Proposed change
IPFS is a content addressable global 'file system' that can share directories using an immutable, globaly unique content ID. It can be used to store code as well as data. There are experiments in the pydata / zarr ecosystem on using it for some datasets as well https://github.com/pangeo-forge/roadmap/issues/40.
I'd like us to add an IPFS content provider, perhaps using https://github.com/fsspec/ipfsspec as the provider backend. When given an IPFS content ID, we can just download that directory and let repo2docker do its normal thing. As content ids are immutable, this fits pretty well with what we wanna do.
Who would use this feature?
Ideally, this would eventually end up on mybinder.org and other binderhubs. IPFS can be a distributed alternative to storing code and content, vs something centralized like GitHub.
How much effort will adding it take?
I'd say most of the work would happen in https://github.com/fsspec/ipfsspec, and might already be done. Otherwise, I suspect it'll be minimal effort.
Who can do this work?
Some IPFS enthusiast, maybe :) Some basic understanding of IPFS concepts maybe necessary to fully implement this.