This PR implements server-side ref expansion in the calls query. When expand_columns are passed (dot.notation strings corresponding to columns), all refs in those exact columns will be expanded.
In the case of nested ref expansion, where a field is requested inside an object that is itself a ref, multiple expand column strings must be provided. For example, to fully de-ref:
batches start out as size 10, so we can quickly return something to the stream, but batch size doubles with each successive batch. Max batch size is capped by the number of refs we request at any given time to an arbitrary limit of 1000.
refs are sent off for resolution in batches as well
batches consist of refs at the same depth within the call objects. The first batch of refs will consist of all refs at depth 0 (ex. ["output", "inputs"]) and increasing from there. The second batch might be columns: ["output.object", "inputs.param1", "inputs.param2"].
Known limitations:
This does not handle the case when the expand column refers to a list of refs. For example inputs.self.scorers when there are two scorers. There are a couple ways this could be supported.
inspect and detect when an expand column points to a list, then add special handling to iterate through the values and deref them. We might consider returning a dict with the keys as the ref indexes of the original list, values the expanded refs. Either way this will require refactor of the clickhouse batched handling.
Require specific references to indices in the list when expanding. Example: inputs.self.scorers[0], inputs.self.scorers[1].
This PR implements server-side ref expansion in the calls query. When
expand_columns
are passed (dot.notation strings corresponding to columns), all refs in those exact columns will be expanded.Example:
In the case of nested ref expansion, where a field is requested inside an object that is itself a ref, multiple expand column strings must be provided. For example, to fully de-ref:
To do the above ref expansion performantly:
10
, so we can quickly return something to the stream, but batch size doubles with each successive batch. Max batch size is capped by the number of refs we request at any given time to an arbitrary limit of1000
.["output", "inputs"]
) and increasing from there. The second batch might be columns:["output.object", "inputs.param1", "inputs.param2"]
.Known limitations:
inputs.self.scorers
when there are two scorers. There are a couple ways this could be supported.inputs.self.scorers[0], inputs.self.scorers[1]
.Server PR: