Open mindrones opened 3 years ago
Another kind
of constraint is when an aggregation parameter type depends on the type of the selected field:
fieldType: number,
label: 'Histogram',
lastChecked: '7.9',
request: {
...
missing: optional(number),
...
},
Here we want to express that if the field will be an integer (e.g. for years), then the missing input n the UI can only accept whole numbers.
This could be expressed like:
request: {
params: {
...
missing: optional(number),
...
},
constraints: [
{kind: 'fieldType', params: ['missing']},
]
},
Some aggregation can only be used as a child of another specific aggregation: for example rate
can only be used inside a date_histogram
.
This could be expressed as a constraints
key on the aggregation object:
export default {
id: 'rate',
...
constraints: [
{kind: 'parent', aggs: ['date_histogram']}
],
...
}
For some aggregations, a parameter can have a set of values depending on a set values in another parameter, potentially in the parent aggregation.
For example, rate
's unit
has a specific relationship with the interval used by the parent aggregation.
export default {
id: 'rate',
...
request: {
params: {
...,
unit: optional(string),
},
constraints: [
{
kind: 'value-sets',
filters: [
{
if: [{
agg: 'parent',
param: 'calendar_interval',
values: calendarIntervals
}],
then: [{
param: 'unit',
values: rateIntervalsToWeek
}],
},
{
if: [{
agg: 'parent',
param: 'calendar_interval',
values: calendarIntervalsFromMonth
}],
then: [{
param: 'unit',
values: rateIntervalsFromMonth
}],
}
],
}
]
}
}
Should we find a value constraint among fields of the same agg, we might express it by simply omitting the agg
key in filters:
request: {
params: {
...,
foo: string,
bar: string,
},
constraints: [
{
kind: 'value-sets',
filters: [
{
if: [{
param: 'foo',
values: fooSet1
}],
then: [{
param: 'bar',
values: barSet1
}],
},
{
if: [{
param: 'foo',
values: fooSet2
}],
then: [{
param: 'bar',
values: barSet2
}],
},
],
}
]
}
Note that with this syntax it'd probably be possible to constraint more than 2 fields:
request: {
params: {
...,
foo: string,
bar: string,
baz: string,
},
constraints: [
{
kind: 'value-sets',
filters: [
{
if: [{
param: 'foo',
values: fooSet1
}],
then: [{
param: 'bar',
values: barSet1
}],
},
{
if: [{
param: 'foo',
values: fooSet2
}],
then: [
{
param: 'bar',
values: barSet2
},
{
param: 'baz',
values: bazSet2
},
]
},
],
}
]
}
This syntax expresses directionality: values in certain fields control values in other fields, not the other way around, so the user would have to avoid conflicts. TBD
Aggregations operate in breadth_first
or depth_first
collect mode.
Sub aggregations requiring scores are incompatible with breadth_first
[1].
These two modes are incompatible because breadth_first
does not work when sub aggregations require scores [1].
This can probably be an implicit constraint once all aggregations have collect_mode
set, but we might want to consider making it explicit using a constraint at the aggregation level:
export default {
id: 'rare_terms',
...
collect_mode: 'breadth_first',
constraints: [
{kind: 'collect_mode', value: `breadth_first`}
],
...
}
[1] Examples:
The RareTerms aggregation has to operate in
breadth_first
mode, since it needs to prune terms as doc count thresholds are breached. This requirement means theRareTerms
aggregation is incompatible with certain combinations of aggregations that requiredepth_first
. In particular, scoring sub-aggregations that are inside anested
force the entire aggregation tree to run indepth_first
mode. This will throw an exception sinceRareTerms
is unable to processdepth_first
.As a concrete example, if
rare_terms
aggregation is the child of anested
aggregation, and one of the child aggregations ofrare_terms
needs document scores (like atop_hits
aggregation), this will throw an exception.
Being a quality-based filter the sampler aggregation needs access to the relevance score produced for each document. It therefore cannot be nested under a terms aggregation which has the collect_mode switched from the default depth_first mode to breadth_first as this discards scores. In this situation an error will be thrown.
Some aggregations don't support child aggregations. [1]
This could be:
export default {
id: 'significant_text',
...
constraints: [
{kind: 'no-children'}
],
...
}
Some aggregations cannot be used with text fields in nested objects. [1]
This could be expressed like this:
export default {
id: 'significant_text',
...
constraints: [
{kind: 'no-nested-objects'}
],
...
}
Some parameters have to be greater than others, e.g.
shard_size
cannot be smaller thansize
(as it doesn’t make much sense). When it is, Elasticsearch will override it and reset it to be equal tosize
.
In this case we might use:
export default {
id: 'significant_text',
...
request: {
params: {
...,
shard_size: optional(integerD(-1)),
size: optional(integerD(10, true)),
},
constraints: [
{kind: 'gt', params: [`shard_size`, `size`]}
]
}
}
In this particular case, for example, the constraint should be valid only if shard_size
is positive:
If
shard_size
is set to -1 (the default) thenshard_size
will be automatically estimated based on the number of shards and thesize
parameter.
so we might need to think about how to express exceptions, as some kind of conditional constraints:
export default {
id: 'significant_text',
...
request: {
params: {
...,
shard_size: optional(integerD(-1)),
size: optional(integerD(10, true)),
},
constraints: [
{
if: [
{kind: 'gt-value', params: [`shard_size`], value: 0}
],
then: [
{kind: 'gt', params: [`shard_size`, `size`]}
],
}
]
}
}```
In this case,
This aggregation cannot currently be nested under any aggregation that collects from more than a single bucket.
Once we give a name to the group of aggregations that collects from a single bucket (say single-bucket
) we need to assign that to a prop (say foo
) in all aggs, then express this constraint at the agg level with something like:
export default {
id: 'variable_width_histogram',
...
constraints: [
{kind: 'parent-type', key: 'foo', values: ['single-bucket']}
],
...
}
Some parameter have a max:
Parameters buckets, shard_size, and initial_buffer are optional. By default, buckets = 10, shard_size = buckets 50, and initial_buffer = min(10 shard_size, 50000).
This could be:
export default {
id: 'variable_width_histogram',
...
request: {
params: {
...,
initial_buffer: optional(integerD(5000)),
},
constraints: [
{kind: 'max-value', params: ['initial_buffer'], value: 50000}
]
}
}
Likewise, parametersr can have a minimum:
sigma
can be any non-negative double
which would be expressed with:
export default {
id: 'extended_stats',
...
request: {
params: {
...,
sigma: optional(floatD(2)),
},
constraints: [
{kind: 'min-value', params: ['sigma'], value: 0}
]
}
}
So far we have:
some
: at least one parameter in a set has to be definedxor
: only one parameter in a set has to be definedThese could be arrays expressed like: