rchain / rchip-proposals

Where RChain improvement proposals can be submitted
Apache License 2.0
8 stars 5 forks source link

binary attachments on an enhanced DeployData message #39

Open dckc opened 3 years ago

dckc commented 3 years ago

Motivation

RChain aspires to “content delivery at the scale of Facebook". One of the pain points in RCat was hex-encoding assets as rholang strings and then decoding them on chain.

Several projects involve binary assets: the encrypted ID wallet, dappy, etc.

Design Sketch

new song1(`rho:attachment:1`) in {
  new stream in {
   contract stream(payment, ret) {
     ...
     ret!(song1)
   }
  }
}

This would let the stream contract send a ByteArray to ret. The bytes of the ByteArray would come from the 1st binary attachment in the GRPC message.

Drawbacks

Non-trivial development time. More stuff for client devs to learn (even if only to ignore it).

Requires a hard-fork.

Alternatives

Do nothing.

Perhaps the 2x size price for hex-encoding and the compute cost of hexToBytes not worth the bother? But that cost is ongoing, whereas the cost of this feature is mostly a one-time thing (modulo ongoing maintenance).

cc @SteveRossTalbot

jimscarver commented 3 years ago

From the discussion in tech gov today considering compression of hex and conversion to binary, along with the inevitable size limits suggesting chunking is necessity and streaming will necessarily be done in rholang using linked lists of up to about 10 meg chunks each deployed separately. Compressing the rholang deploys on chain is under consideration in which case the hex compresses well and that was a simpler more general way to save space at this time..

The issue of retention was raised, perhaps using a timestamp or block height for expiration and maybe a deployId or some unforgeable to extend the expiration.

tgrospic commented 3 years ago

As @jimscarver mentioned, chunking can now be done on the deploy level with hex encoded string. Using one deploy with large amount of data makes difficulty with gRPC because it has max message limit except used in streaming mode. Also validation of this kind of large deploy is becoming more complex.

I've tested the difference (or impact) of storing binary as hex and conversion.

[1] Storing hex string on a channel

new return(`rho:rchain:deployId`), x in {
  x!("<bytes>") |
  for(@a <- x) {
    return!(a)
  }
}

[2] Storing binary on a channel

new return(`rho:rchain:deployId`), x in {
  x!("<bytes>".hexToBytes()) |
  for(@a <- x) {
    return!(a)
  }
}

Cost in phlogiston of storing hex string with or without conversion to binary.

value\cost 10 bytes 20 bytes write/read per 10 bytes
[1] string 903 983 80
[2] bytes 919 989 70

Calling hexToBytes has constant overhead but it has lower cost per byte.

tgrospic commented 3 years ago

As part of REV vault changes @Isaac-DeFrain created this example of linked list which can be used to store chunks of binary data.

new
  empty,
  cons,
  print,
  stdout(`rho:io:stdout`)
in {
  // adds an element to the head of an existing linked list
  contract cons(@value, pointer, ret) = {
    new elem in {
      elem!(value, *pointer) |
      ret!(*elem)
    }
  } |
  // prints all elements in the list from head to tail
  contract print(elem, ret) = {
    for (@value, @next_elem <- elem) {
      if (value != Nil) {
        ret!(value) |
        print!(next_elem, *ret)
      }
    }
  } |
  // build a linked list and print the elements from head to tail
  new tmp in {
    empty!(Nil, Nil) |
    cons!(2, *empty, *tmp) |
    for (@elem <- tmp) {
      cons!(1, elem, *tmp) |
      for (@elem <- tmp) {
        cons!(0, elem, *tmp) |
        for (@elem <- tmp) {
          print!(elem, *stdout)
        }
      }
    }
  }
}
dckc commented 2 years ago

So clearly large amounts of data have to be split between deploys.

But still, for chunks of moderate size, the cost of hex-encoding seems fairly high. thanks for the measurements for the cost of evaluation, @tgrospic . We also have a charge for parsing the rholang source code containing the hex string, before interpreting it, right?

dckc commented 2 years ago

Meanwhile, I gather there's a bittorrent connection in progress; surely that would render this moot. Anybody have a pointer handy?

tgrospic commented 2 years ago

So clearly large amounts of data have to be split between deploys.

But still, for chunks of moderate size, the cost of hex-encoding seems fairly high. thanks for the measurements for the cost of evaluation, @tgrospic . We also have a charge for parsing the rholang source code containing the hex string, before interpreting it, right?

My measurement showed that hexToBytes conversion has constant overhead which means when parsing is done all binary data is already converted and conversion from hex is not dependent on size of data.

Cost of parsing is 1 phlogiston per byte of source code.