ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.11k stars 3.01k forks source link

Potential solution to unknowingly automatic download of illegal content through bitswap #4083

Closed alphaCTzo7G closed 7 years ago

alphaCTzo7G commented 7 years ago

As I understand it, Bitswap automatically downloads content from peers to make sure every node doesnt become a leech and to distribute the network load (Ref: https://news.ycombinator.com/item?id=12809259, https://github.com/ipfs/papers/blob/master/ipfs-cap2pfs/ipfs-p2p-file-system.pdf (sec 3.4))

However, automatic download of unknown blocks from unknown peers could be a security threat as well as a legal threat.

Instead of downloading unknown files from unknown peers, is it possible to change the protocol to download and distribute files from a whitelist?

Not part of NIH/NCBI, but big data storage is a problem for genomic data sets, and NIH/NCBI is already struggling with this issue (https://ncbiinsights.ncbi.nlm.nih.gov/2017/05/09/phasing-out-support-for-non-human-genome-organism-data-in-dbsnp-and-dbvar/). I know of other open data platforms which are also struggling with this problem. Can this bitswap protocol be changed such that there is a white list which downloads and automatically seeds files from known, reliable peerID (which dont contain cp etc)..

Potential advantages (not-necessarily technical, but drive adoption of IPFS and solve a real problem)

  1. mutually beneficial to IPFS, NCBI, NIH
  2. advertisement of IPFS to NCBI community, open data science communities, which make IPFS more popular and a standard in open data science
  3. reduces legal/security threats to peers downloading files, and as a result consumers are more likely to allow bitswap to run on their nodes
  4. society benefits overall, data sharing in science improve (which in turn will lead more private nodes to join IPFS)
  5. sharing non-human genomes which are already vetted have no privacy risks
  6. it prioritizes backup, storage, and distribution of important bits (not all bits are created equal, IMO)
Kubuxu commented 7 years ago

Currently Bitswap does not download any content that is not requested by the node operator/user.

If that resolves this issue, please close it.

We are building higher layer solution that might help you https://github.com/ipfs/ipfs-cluster/

alphaCTzo7G commented 7 years ago

Oh ok.. Thanks for the great work!

Is there a writeup on how bitswap works currently and how it reduces leeching?

I noticed this write up.. seems newer than the paper I linked before.. https://github.com/ipfs/go-ipfs/tree/73cd8b3e98aba252f0eadcc625472103a2dd1d53/exchange/bitswap

Perhaps there should be some notes added to the papers section mentioning which sections are no longer valid, because people might be reading that and misunderstanding the current implementation.

it mentions that "The number of task workers is limited by a constant factor."

Does it mean that the length of the wantlist (list contain requests to nodeA from other peers) and the Client request queue (queue containing requests from nodeA to other peers) is constant? Thats how leeching is reduced?

Stebalien commented 7 years ago

IPFS as implemented doesn't currently prevent leeching. We keep track of how much we've received and sent to specific peers but we don't yet use this information. In the future, we'll probably have some form of mixed model that uses:

Basically, peers that upload as much as they download shouldn't have to pay anyone but new peers and ones that download more than they upload will.

The number of task workers is limited by a constant factor.

That means we service at most N (where N is reasonably small) peer requests at a time. Basically, we try not to DoS ourselves by trying to be too helpful.

alphaCTzo7G commented 7 years ago

Thanks for the explanation..

These ideas look great.. potential what I mentioned could be implemented through the web of trust using global white lists..somewhat trivially..