Open guidiaz opened 1 year ago
Overall I think this is a good idea, here are my thoughts on this after discussing this with @guidiaz.
SUM(i)
to sum i elements, use a new ArraySplit(i) operator, and then ArrayReduce(Sum).The behavior of those new operators is described in the examples below, and some of them are implemented in this branch:
https://github.com/tmpolaczyk/witnet-rust/tree/radon-alter-aggr
Probably not possible to have arbitrary scripts, as that would interfere with the reputation system (slashing of liars).
The following examples try to implement the source and aggregation stages of the example use cases:
Sources: to separate btc/usd from eth/usd sources, add a new operator as the last operator of the source scripts:
12345.67
>>> [AsMapWithKey, "btc/usd"]
{"btc/usd": 12345.67}
Then, the output of each source will look like either one of:
{"btc/usd": 12345.67}
{"eth/usd": 1234.67}
And the input of the aggreagation stage will be an array of maps, like
[{"btc/usd": 12345.67}, {"eth/usd": 1234.67}, {"btc/usd", 13333.67}]
The aggregation script can then start with a ArrayOfMapsIntoMapOfArrays operator:
{"btc/usd": [12345.67, 13333.67], "eth/usd": [1234.67]}
And by using MapAlter operator, each map key can be filtered and reduced independently:
{"btc/usd": 13000.0, "eth/usd": 1234.67 }
In this example we assume that each source returns the price of one asset, however that's not mandatory. As long as it is possible to transform the response JSON into an object with one key per asset, it should work just as fine with the ArrayOfMapsIntoMapOfArrays operator, as if they were separate sources.
Similar to Example 1, but now the aggregation stage needs an additional operator to convert
{"btc/usd": 16000, "eth/usd": 1600}
Into 10.0 by calculating 16000 / 1600.
One way to do it is to convert 1600 into (1600^-1), and multiply both numbers. We can achieve that using the MapAlter operator, combined with FloatPower(-1) to calculate 1/1600. Then, convert the map into an array, and use ArrayReduce(Product) to multiply all the values of the array:
{"btc/usd": 16000, "eth/usd": 1600}
>>> [MapAlter, "eth/usd", [ [ArrayAlter, 1, [ [FloatPower, -1] ]] ]]
{"btc/usd": 16000, "eth/usd": 0.000625}
>>> MapValues
[16000, 0.000625]
>>> [ArrayReduce, Product]
10.0
Instead of making each source return the price, we also want them to return the volume:
[1234.67, 111111111.0]
This array of [price, volume] can be created using the MapAlter, ArrayAlter, MapFilter, and StringMatch operators.
Then the input of the aggregation stage will be an array of arrays, like:
[[ 2.091, 9012233.0 ], [ 1.011, 12345.7 ], [ 2.045, 555323 ], [ 2.12, 1234.56 ]]
To be able to apply filters independently to price and volume, we need a new operator, ArrayFilterBy. We also need a new reducer, WeightedAverage, which will convert this array of [price, volume] into the weighted average, and the total volume:
[[ 2.091, 9012233.0], [2.045, 555323 ]]
>>> [ArrayReduce, WeightedAverage]
[ 2088.3300539, 19980214738.0 ]
In case the volume data is fetched from a different source than the price data, the sources will need to use the AsMapWithKey operator to tag which exchange they correspond to. For example, the first "exchange1" is the price, while the second "exchange1" is the volume, so the input of the aggregation stage may look like:
[{"exchange1": 2.091}, {"exchange2": 1.011}, {"exchange1": 9012233.0}, {"exchange2": 12345.7}]
>>> ArrayOfMapsIntoMapOfArrays
{"exchange1": [2.091, 9012233.0], "exchange2": [1.011, 12345.7]}
>>> MapValues
[[ 2.091, 9012233.0], [2.045, 555323 ]]
>>> [ArrayReduce, WeightedAverage]
[ 2088.3300539, 19980214738.0 ]
>>> [ArrayGet, 0]
2088.3300539
@tmpolaczyk brilliant job there! Really looks like logical next evolution for RADON, and the materialization of what we always wanted it to be.
From a technical standpoint, I want to see this landing sooner or later. Then, we could slowly rework our current feeds so that they leverage this new construct (in the cases that it's a clear advantage).
From a strategic standpoint, however, I'm a bit divided. I believe that at this point we should optimize for adoption. So while I want this implemented asap for the sake of completion and some theoretical operating costs cuts, I can't see it having a significant impact on adoption and success :thinking:
Precisely, the origin of the whole idea was to be able to attend composable price feeds (use case #2), as they are quite much demanded. Solving composable price feeds at the smart contract level is both expensive and unreliable.
Radon scripting at the retrieval phase has shown to be highly effective on crawling and extracting concrete data from both HTTP/GET and HTTP/POST sources. It's this kind of radon scripting plus the current reducing mechanism at the aggregation phase that fundamentally enables the implementation of "basic" price feeds.
However, there are some interesting use cases that just cannot be tackled at the moment:
# 1: Solving multiple price feeds within one single data request:
# 2: Price feed composition (e.g.
btc/usd * usd/eth = btc/eth
):# 3: Price averaging using trade volume data as weight (e.g.
<[10000, 1e6], [15000, 1e3]> = [10005, 1e6+1e3]
):Currently, upon start of the aggregation phase an array of values is composed with every primitive value returned from every source during the retrieval phase, over which the following math is applied (assuming that all elements within the aggregation array are operable with each other):
In practice, the RADAggregate struct is currently defined at the protobuf level as:
All this said, here comes a possible proposal for enriching the aggregation scripting mechanism:
Changes at the protobuf level:
New aggregation workflow:
At the end of retrieval phase, check type compatibility of values, allowing sources values to either be: a single string, a single byte buffer, a single numeric value, a single boolean, or an array of any combination of any aforementioned primitive values (all but nested arrays or maps).
For variable size elements within the aggregation array, check whether there's any (new, but optional) min/max result rank parameters specified within the corresponding RADRetrieve struct, eventually cutting the string, buffer or array if oversized, or making the DR revert on aggregation level if undersized.
Compose the so-called aggregation array, being either: an array of strings, an array of buffers, an array of booleans, an array of numbers, or an array of array of primitives.
If no optional
script
is set withinRADAggregator
, just follow the same workflow as now. Currently supported filters should be adapted to work, or fail, depending on the nature of the input aggregation array being processed. Moreover, a new set of "weigthed" reducers could well be added as to support # 3 use cases (e.g. "AverageMeanWeighted", requiring each aggregation array element to be a vector of two numbers).Should a
script
be set within theRADAggregator
part of a DR, bothfilters
andreducer
fields should be ignored, and the aggregation array be converted into a "stack" of values that will feed the newly proposed aggregation scripting sub-phase.The aggregation script would basically implement a simple stack-based calculator, in which every operator could either "use" or "pop" a selection of values from the stack, and eventually "replace" or "push" new values into the aggregation stack.
A Radon aggregation script would be serialized as an array of reducing operations, each one containing either one single opcode or an opcode and an array of arguments. As exempli gratia, the following (not complete) list of aggregation scripting operators is foreseen:
SWAP(i) | 0<i<S
=> swap stack top w/ i'th element; fails if i is greater than size stack minus 1;PIVOT(i) | 0<=i<S
=> consider i'th element on the stack to be an array (fails otherwise), remove the i'th element from the stack, and then push every single element to the stack.APPEND(i) | 0<i<S
=> pop first i+1 elements from the stack, and push them to the stack as an array of elements; fails if any of the popped elements is an array of arrays.SUM(i) | 0<i<S
=> pop first i + 1 elements from the stack, add them together, and then push result into the stack; fails if popped values are not summable among each other.CSUM(v)
=> pop stack top, add the provided value, and then push the result back into the stack; fails ifv
type is not "summable" to the popped valued.MUL(i) | 0<i<S
=> ...CMUL(v)
=> ...CPOWER(v)
=> ...DIV
=> pop first two values from the stack, use first as denominator and second as divisor, push result into the stack; fails if values are not divisible.CDIV(v)
=> pop top value as denominator and usev
as divisor, push result into the stack; fails if values are not divisible.MOD
=> ..CMOD(v)
=> ..REDUCE(i, opcode[, filter, [v]]) | 0<=i<S
=> compose array from first i+1 elements popped from the stack, apply array filter if any (and optionally parameterized withv
), and then push the result of applying specified reducer opcode over input (filtered) array back to the stack.As a practical way to limit complexity, the aggregation stack would be limited to 256 elements. Thus, a DR could eventually throw an "aggregation stack overflow" exception at the aggregation phase.
The resulting value at the end of the aggregation/scripting phase would correspond with the final stack state:
If a (new, but optional)
result_max_size
is defined within theRADRequest
struct, and the size of the CBOR buffer containing the serailization of the aggregation result is greater thanresult_max_size
, the DR would throw an "aggregation result oversized" exception at the aggregation phase.