peer2peer data hosting - Githubissues

This issue is more philosophical/discussion orientated than actionable (I think).

People want to use "data" on mybinder.org (or any Binder deployment) and currently we offer no particular integration ("use postBuild to fetch it" or "use your data's API from a notebook") and we block things like FTP. If people started using mybinder.org with seriously large datasets (fetching all of it and then subsampling on mybinder.org) it will increase our bandwidth costs.

All this made me think: is there a good peer2peer system that can be exposed as a posix filesystem that we can make available to each binder? Ideally one where data is only fetched if you actually read a file, not just listing directories. For popular datasets chances are we would have a copy on an active binder already and can transfer it internally. It would also allow people to access data that is otherwise on FTP by somehow adding it to this distributed filesytem.

My questions:

are there technical solutions that would fit the requirements?
what political/technical arguments are there against doing this?

jupyterhub / mybinder.org-deploy

peer2peer data hosting #625