Enrich Radon aggregation scripting

guidiaz commented 1 year ago

Radon scripting at the retrieval phase has shown to be highly effective on crawling and extracting concrete data from both HTTP/GET and HTTP/POST sources. It's this kind of radon scripting plus the current reducing mechanism at the aggregation phase that fundamentally enables the implementation of "basic" price feeds.

However, there are some interesting use cases that just cannot be tackled at the moment:

# 1: Solving multiple price feeds within one single data request:
- Improves user experience:
- as just one DR would atomically update multiple data feeds;
- less gas expenditure may be required on the sidechain.
- Good for the Witnet ecosystem:
- network scalability: less chances of congesting the mempool.
- does not imply less expenditure of wits (as the DR's weight increases)
- competitive advantage against native http single-retrieval solutions like those offered by Boba-turing, or Skale.
# 2: Price feed composition (e.g. btc/usd * usd/eth = btc/eth):
- Improves user experience:
- same advantages as before;
- plus, no need to perform any further on-chain computation;
- plus, no risk of composing totally erroneous prices because any of the involved pairs stops from being promptly updated.
- Good for the Witnet ecosystem:
- same reasons.
# 3: Price averaging using trade volume data as weight (e.g. <[10000, 1e6], [15000, 1e3]> = [10005, 1e6+1e3]):
- Improves user experience.
- Good for the Witnet ecosystem.

Currently, upon start of the aggregation phase an array of values is composed with every primitive value returned from every source during the retrieval phase, over which the following math is applied (assuming that all elements within the aggregation array are operable with each other):

zero, one or multiple filtering functions can be sequentially applied to the aggregation array: every filter is applied over all elements of input array, and may progressively reduce the dimension of the resulting array;
one single reducing operator must be applied, always converting the input array into one single (scalar) value.

In practice, the RADAggregate struct is currently defined at the protobuf level as:

  message RADAggregate {
      repeated RADFilter filters = 1;
      uint32 reducer = 2;
  }
  message RADFilter {
      uint32 op = 1;
      bytes args = 2;
  }

All this said, here comes a possible proposal for enriching the aggregation scripting mechanism:

Changes at the protobuf level:

message RADRequest {
// ...
message RADRetrieve {
  RADType kind = 1;
  string url = 2;
  bytes script = 3;
  bytes body = 4; 
  repeated StringPair headers = 5;
  uint16 result_min_rank = 6; 
  uint16 result_max_rank = 7;
}
message RADAggregate {
  repeated RADFilter filters = 1;
  uint32 reducer = 2;
  optional bytes script = 3;
}
message RADFilter {
  uint32 op = 1;
  bytes args = 2; 
}
uint64 time_lock = 1;
repeated RADRetrieve retrieve = 2;
RADAggregate aggregate = 3;
RADTally tally = 4;
uint16 result_max_size = 5;
}

New aggregation workflow:
1. At the end of retrieval phase, check type compatibility of values, allowing sources values to either be: a single string, a single byte buffer, a single numeric value, a single boolean, or an array of any combination of any aforementioned primitive values (all but nested arrays or maps).
2. For variable size elements within the aggregation array, check whether there's any (new, but optional) min/max result rank parameters specified within the corresponding RADRetrieve struct, eventually cutting the string, buffer or array if oversized, or making the DR revert on aggregation level if undersized.
3. Compose the so-called aggregation array, being either: an array of strings, an array of buffers, an array of booleans, an array of numbers, or an array of array of primitives.
4. If no optional script is set within RADAggregator, just follow the same workflow as now. Currently supported filters should be adapted to work, or fail, depending on the nature of the input aggregation array being processed. Moreover, a new set of "weigthed" reducers could well be added as to support # 3 use cases (e.g. "AverageMeanWeighted", requiring each aggregation array element to be a vector of two numbers).
5. Should a script be set within the RADAggregator part of a DR, both filters and reducer fields should be ignored, and the aggregation array be converted into a "stack" of values that will feed the newly proposed aggregation scripting sub-phase.
6. The aggregation script would basically implement a simple stack-based calculator, in which every operator could either "use" or "pop" a selection of values from the stack, and eventually "replace" or "push" new values into the aggregation stack.
7. A Radon aggregation script would be serialized as an array of reducing operations, each one containing either one single opcode or an opcode and an array of arguments. As exempli gratia, the following (not complete) list of aggregation scripting operators is foreseen:
  - SWAP(i) | 0<i<S => swap stack top w/ i'th element; fails if i is greater than size stack minus 1;
  - PIVOT(i) | 0<=i<S => consider i'th element on the stack to be an array (fails otherwise), remove the i'th element from the stack, and then push every single element to the stack.
  - APPEND(i) | 0<i<S => pop first i+1 elements from the stack, and push them to the stack as an array of elements; fails if any of the popped elements is an array of arrays.
  - SUM(i) | 0<i<S => pop first i + 1 elements from the stack, add them together, and then push result into the stack; fails if popped values are not summable among each other.
  - CSUM(v) => pop stack top, add the provided value, and then push the result back into the stack; fails if v type is not "summable" to the popped valued.
  - MUL(i) | 0<i<S=> ...
  - CMUL(v) => ...
  - CPOWER(v) => ...
  - DIV => pop first two values from the stack, use first as denominator and second as divisor, push result into the stack; fails if values are not divisible.
  - CDIV(v) => pop top value as denominator and use v as divisor, push result into the stack; fails if values are not divisible.
  - MOD => ..
  - CMOD(v) => ..
  - REDUCE(i, opcode[, filter, [v]]) | 0<=i<S => compose array from first i+1 elements popped from the stack, apply array filter if any (and optionally parameterized with v), and then push the result of applying specified reducer opcode over input (filtered) array back to the stack.
8. As a practical way to limit complexity, the aggregation stack would be limited to 256 elements. Thus, a DR could eventually throw an "aggregation stack overflow" exception at the aggregation phase.
9. The resulting value at the end of the aggregation/scripting phase would correspond with the final stack state:
  - if the stack ends up empty, of containing more than one singe value, the DR would revert (errored at the aggregation phase);
  - otherwise, the top of the stack will then correspond to the DR result, being either: a string, a buffer, a boolean, a number or an array of primitives.
10. If a (new, but optional) result_max_size is defined within the RADRequest struct, and the size of the CBOR buffer containing the serailization of the aggregation result is greater than result_max_size, the DR would throw an "aggregation result oversized" exception at the aggregation phase.

tmpolaczyk commented 1 year ago

Overall I think this is a good idea, here are my thoughts on this after discussing this with @guidiaz.

Prefer RADON to witscript

WitScript interaction with radon types is not specified yet.
Libraries like witnet-requests-js support radon but no witscript.
A small number of new operators could handle the example use cases.
It is true that RADON operators must always have one input and one output, but that can be worked around in some cases by using arrays. For example, instead of using SUM(i) to sum i elements, use a new ArraySplit(i) operator, and then ArrayReduce(Sum).

New RADON operators

Operators from https://github.com/witnet/witnet-rust/issues/2291
ArrayReduce(Product), ArrayReduce(Sum), ArrayReduce(WeightedAverage), ArrayOfMapsIntoMapOfArrays, ArrayFilterBy, AsMapWithKey.

The behavior of those new operators is described in the examples below, and some of them are implemented in this branch:

https://github.com/tmpolaczyk/witnet-rust/tree/radon-alter-aggr

Aggregation scripts:

Can be as complex as retrieval scripts.
The input is an array, so all the sources must return the same output type.
Currently, in case of error, a source is removed from the array. That could interfere with data requests where the order of the sources is important. For example, 3 sources where the first one should have twice the weight of the other two. A workaround is to manually tag each source using the AsMapWithKey operator.
Currently, we are unable to specify the "max allowed error sources" in a data request. We should be able to add an ArrayFilter that ensures that there are at least N elements. For example, a data request with 5 sources could use [ArrayFilter, [LengthAtLeast, 3]] to ensure that at least 3 of the sources reported a non-error value.

Tally scripts:

Probably not possible to have arbitrary scripts, as that would interfere with the reputation system (slashing of liars).

The following examples try to implement the source and aggregation stages of the example use cases:

Example 1: Solving multiple price feeds within one single data request

Sources: to separate btc/usd from eth/usd sources, add a new operator as the last operator of the source scripts:

12345.67
>>> [AsMapWithKey, "btc/usd"]
{"btc/usd": 12345.67}

Then, the output of each source will look like either one of:

{"btc/usd": 12345.67}
{"eth/usd": 1234.67}

And the input of the aggreagation stage will be an array of maps, like

[{"btc/usd": 12345.67}, {"eth/usd": 1234.67}, {"btc/usd", 13333.67}]

The aggregation script can then start with a ArrayOfMapsIntoMapOfArrays operator:

{"btc/usd": [12345.67, 13333.67], "eth/usd": [1234.67]}

And by using MapAlter operator, each map key can be filtered and reduced independently:

{"btc/usd": 13000.0, "eth/usd": 1234.67 }

In this example we assume that each source returns the price of one asset, however that's not mandatory. As long as it is possible to transform the response JSON into an object with one key per asset, it should work just as fine with the ArrayOfMapsIntoMapOfArrays operator, as if they were separate sources.

Example 2: Price feed composition (e.g. btc/usd * usd/eth = btc/eth):

Similar to Example 1, but now the aggregation stage needs an additional operator to convert

{"btc/usd": 16000, "eth/usd": 1600}

Into 10.0 by calculating 16000 / 1600.

One way to do it is to convert 1600 into (1600^-1), and multiply both numbers. We can achieve that using the MapAlter operator, combined with FloatPower(-1) to calculate 1/1600. Then, convert the map into an array, and use ArrayReduce(Product) to multiply all the values of the array:

{"btc/usd": 16000, "eth/usd": 1600}
>>> [MapAlter, "eth/usd", [ [ArrayAlter, 1, [ [FloatPower, -1] ]] ]]
{"btc/usd": 16000, "eth/usd": 0.000625}
>>> MapValues
[16000, 0.000625]
>>> [ArrayReduce, Product]
10.0

Example 3: Price averaging using trade volume data as weight

Instead of making each source return the price, we also want them to return the volume:

[1234.67, 111111111.0]

This array of [price, volume] can be created using the MapAlter, ArrayAlter, MapFilter, and StringMatch operators.

Then the input of the aggregation stage will be an array of arrays, like:

[[ 2.091, 9012233.0 ], [ 1.011, 12345.7 ], [ 2.045, 555323 ], [ 2.12, 1234.56 ]]

To be able to apply filters independently to price and volume, we need a new operator, ArrayFilterBy. We also need a new reducer, WeightedAverage, which will convert this array of [price, volume] into the weighted average, and the total volume:

[[ 2.091, 9012233.0], [2.045, 555323 ]]
>>> [ArrayReduce, WeightedAverage]
[ 2088.3300539, 19980214738.0 ]

In case the volume data is fetched from a different source than the price data, the sources will need to use the AsMapWithKey operator to tag which exchange they correspond to. For example, the first "exchange1" is the price, while the second "exchange1" is the volume, so the input of the aggregation stage may look like:

[{"exchange1": 2.091}, {"exchange2": 1.011}, {"exchange1": 9012233.0}, {"exchange2": 12345.7}]
>>> ArrayOfMapsIntoMapOfArrays
{"exchange1": [2.091, 9012233.0], "exchange2": [1.011, 12345.7]}
>>> MapValues
[[ 2.091, 9012233.0], [2.045, 555323 ]]
>>> [ArrayReduce, WeightedAverage]
[ 2088.3300539, 19980214738.0 ]
>>> [ArrayGet, 0]
2088.3300539

aesedepece commented 1 year ago

@tmpolaczyk brilliant job there! Really looks like logical next evolution for RADON, and the materialization of what we always wanted it to be.

From a technical standpoint, I want to see this landing sooner or later. Then, we could slowly rework our current feeds so that they leverage this new construct (in the cases that it's a clear advantage).

From a strategic standpoint, however, I'm a bit divided. I believe that at this point we should optimize for adoption. So while I want this implemented asap for the sake of completion and some theoretical operating costs cuts, I can't see it having a significant impact on adoption and success :thinking:

guidiaz commented 1 year ago

Precisely, the origin of the whole idea was to be able to attend composable price feeds (use case #2), as they are quite much demanded. Solving composable price feeds at the smart contract level is both expensive and unreliable.

witnet / witnet-rust