qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.11k stars 66 forks source link

Next steps for p2p #1475

Open b5 opened 4 years ago

b5 commented 4 years ago

We've been talking a bunch lately about revising and improving our p2p story. To figure out the role for the p2p package, and peer-2-peer in our stack more broadly. I think it's easier to start with a user story that should exist in the near future:

Oversized Version Push:

  1. I build a dataset bigger than 1 gig. Say a mirror of NYC taxi data called nyc-transit-data/yellow_taxi_trips_2018.
  2. I run qri push nyc-transit-data/yellow_taxi_trips_2018 to push this dataset to the registry. a. Qri does a logsync to send the history, which puts the dataset history on the registry. b. Push pushing a version is rejected with a new canonical error: ErrVersionNotAccepted. The error itself is wrapped with a reason: dataset version not accepted: 15Gig version size exceeds max 250MB version size. c. My local node knows about this error, and responds by creating a preview locally, then calls remote.PushVersionPreview with the preview. The preview is signed with my key pair and sent to the remote (in this case, the registry). The remote accepts the preview.
  3. CLI prints an error message for me:

    
    We couldn't push version data to https://registry.qri.io:
    
    15Gig version size exceeds max 250MB version size

However, the remote did accept log data and a preview of the dataset, and you can provide this dataset to others via the distributed web by leaving 'qri connect' running, or pushing this dataset to a peer remote. For more info on peer-2-peer datasets, see: https://qri.io/docs/peer-2-peer-datasets

4. qri.cloud/nyc-transit-data/yellow_taxi_trips_2018 shows the preview, but the page has a warning "this dataset is only available via the distributed web". Users can create issues, and see the dataset preview.

On the other side, users can pull this dataset directly from me via IPFS if I'm online using _p2p-backed pull:_

### p2p-backed pull
Another user `zeehan` wants to pull `nyc-transit-data/yellow_taxi_trips_2018`. 
1. zeehan runs `qri pull nyc-transit-data/yellow_taxi_trips_2018`
2. `lib.PullDataset`, qri resolves the reference off the registry, the default network resolver.
3. `lib.PullDataset` calls`remote.client.PullDataset`, passing the registry as the source.
  a. logsync from registry works successfully
  b. `remoteClient.PullVersion` gets a `ErrNotFound` response from the registry.
4. `lib.PullDataset` prints a message:

qri has pulled log data from https://registry.qri.io, but this remote doesn't have version QmFoo... connecting to the decentralized web to search for dataset version providers...

5. `lib.PullDataset` calls `node.GoOnline`, then calls `node.inst.Filesystem.Fetch("/ipfs/QmFoo...")` to start looking for providers.
6. After some time the dataset is found & fetched via IPFS. terminal prints:

dataset successfully pulled from the d.web!



To do both these things, we're going to need to improve the p2p package &  number of pieces of internal logic.

## What the `p2p` should be able to do
Long term I think this is the _complete_ list, ordered from most to least important
* successfully connect and stay connected to other peers that support the qri protocol
* get profile details about a peer for a given multiaddr / PeerID
* push dataset previews to a peer at a given multiaddr / PeerID
* resolve references via connected qri peers (AKA implement `dsref.RefResolver`)
* pull dataset previews from other qri peers
* return a list of dataset feeds provided by a given peer
* fetch a dataset feed from a given peer

At a high level, we have `dsync` and `logsync`, I  _don't_ think we want a `previewSync`, `peerSync`, or `feedSync`, all of those should be part of the base qri protocol.
ramfox commented 4 years ago

Notes for potential QriIDService that would successfully connect and stay connect to other peers that support the qri protocol

set up QriIdenityService