ssbc / ssb-blobs

blob gossiping ssb-subprotocol
MIT License
12 stars 11 forks source link

streaming blobs from peers #31

Open regular opened 4 years ago

regular commented 4 years ago

Currently, for a peer to get a blob, it has to

  1. call blobs.want(id, cb) to cause the machinery to download a blob from a peer to the local file system (blob store)
  2. after the cb fired, call blobs.get(id) to get a stream of the blob data from the filesystem

The problem here is that the first byte of data will be available to the application only after the blob is transferred from another peer in its entirety. This becomes an issues when we are dealing with very large blobs (I have blobs of multiple GB) because the UI cannot show any feedback about sync progress of a particular blob.

I'd like to add an API that has the same net effect as the two steps above, but streams data to the application while it arrives from another peer and also informs its caller about the total blob size to expect ahead of time, so that a progress bar can be rendered.

Usage would be something like:

let meta
let total = 0
pull(
  ssb.blobs.getLive(id),
  pull.filter( data=>{
    // first item is meta object
    if (meta == undefined) {
      meta = data
      return false
    }
    return true
  }),
  pull.through( data =>{
    total += data.length
    console.log(`progress: ${total} of ${meta.size}`)
   // can also calculate transfer speed etc
  })
  [ do something with the data, progressively ]
)

meta could be {size, peer}

I'll probably implement this with an instance of pull-notify per blob id that is live streamed

Any thoughts?

regular commented 4 years ago

Update: I now think it's a bad idea to stream data to the client before ssb can check the blob data's integrity. Only if we have all data, we'll know if the sha is correct. Presenting data to a client before would potentially expose it to malicious data. A bad actor can run a pub that responds to all blob requests with made-up data, hoping clients render it progressively and only find out about the fake after it is too late. (virus embedded in a pdf, streaming fake video ....)

Instead I can solve the above use case with a more detailed changes stream. Such a stream could include pending transfers with progress information that is updated periodically.

mixmix commented 3 years ago

I like your train of thought @regular

Heads up I've been refactoring this module to mage it easier to read and maintain.

Also, just checking you know about ssb.blobs.push

You could mess with that and sympathy to force blobs out to all peers (imagining eager loading would be very useful in some kiosk type setups)