029-Improving data queries

Data queries need to be improved for SDMX 3.0. In a nutshell, it is proposed to

Add support for multiple keys

Currently, the key parameter only support one (possibly partial) key. The + operator can be used to supply more than one value for any dimension. This works well if the Cartesian product of the dimensions where the + operator has been used, represents what you want. If not, having the option of supplying multiple partial keys would work better but this is currently not supported.

For example, let’s imagine a fictive inflation DSD made of the following dimensions: FREQ, REF_AREA, INFLATION_ITEM, SOURCE, TYPE. Let’s imagine that there are several values for type, such as the index (INX), the weights (INW), the annual rate of change (ANR) and the contribution to growth (CTG). Let’s say that source A supplies data for all 4 types and source B only for 2 (ANR & CTG). You want INX and ANR from source A and CTG from source B.

The current API would allow a query like the following: M…A+B.ANR+INX+CTG.

However, this is not what we want, as it would return ANR and CTG data for both sources A & B. It is therefore proposed to allow using a comma, to separate (possibly partial) keys: M…A.ANR,M…A.INX,M…B.CTG

Extend the context of data retrieval

The first path parameter of the current data queries holds a reference to the dataflow of the data to be returned. It must resolve to one single artefact.

It is proposed to modify this parameter to:

Accept 2 additional types of artefacts (datastructure and provisionagreement);
Allow multiple values (and wildcarding) of (at least) the version;

By doing this, we can:

Retrieve all data structured by the same DSD, regardless of whether or not they belong to the same dataflow;
Retrieve data across versions of structures;
Simplify the API by removing the 3rd path parameter (provider), as it is now covered by the provisionagreement context.

It is proposed to use the same path parameters as for the structure and schema queries, thereby aligning all query types.

To retrieve all data structured according to the latest version of the ECB_EXR1 DSD maintained by the ECB, the following query could be performed: https://ws-entry-point/data/datastructure/ECB/ECB_EXR1/latest

Add a “cube-based” data retrieval, in addition to the current “key-based” one

The current API requires knowledge of the series key. This can be tedious in some cases, even more so in case of DSDs with many dimensions. It is therefore proposed to add support for a cube-based filtering mechanism.

To reuse the fictive DSD mentioned above, in case you want to retrieve all public inflation data about Switzerland (i.e. neither confidential, nor restricted), the following would be enough: https://ws-entry-point/data/dataflow/ESTAT/ICP?c[REF_AREA]=CH&c[CONF_STATUS]=F

This mechanism could support multiple values. For example, in case you want to retrieve all public inflation data about Switzerland and Germany, the following would be enough: https://ws-entry-point/data/dataflow/ESTAT/ICP?c[REF_AREA]=CH,DE&c[CONF_STATUS]=F

Furthermore, support for operators could be introduced. For example, to retrieve all inflation data about Switzerland and Germany, for reporting periods in 2018 or above the following would be enough: https://ws-entry-point/data/dataflow/ESTAT/ICP?c[REF_AREA]=CH,DE&c[TIME_PERIOD]=ge:2018

When the operator is not specified, and there is only one value, it would default to meaning “equals to”. When the operator is not specified, and there are multiple values, it would default to meaning “or”.

It is proposed to start with the following operators:

Operator	Meaning
eq	Equals
ne	Not equal to
lt	Less than
le	Less than or equal to
gt	Greater than
ge	Greater than or equal to
co	Contains
nc	Does not contain
sw	Starts with
ew	Ends with
nd	And
or	Or

If cube-based filters are introduced, the startPeriod and endPeriod query parameters are no longer needed and may be removed from the API.

The previous key-based query mechanism would continue to be supported by introducing a new special parameter for key based queries: https://ws-entry-point/data/dataflow/ESTAT/ICP?key=M…A.ANR,M…A.INX,M…B.CTG

Obviously, it should be possible to combine both key-based and cube-based filtering mechanisms: https://ws-entry-point/data/dataflow/ESTAT/ICP?key=M…A.ANR,M…A.INX,M…B.CTG&c[OBS_STATUS]=N.

Harmonizing separators

Although potentially painful, we propose to take the opportunity offered by SDMX 3.0 to harmonize how separators are used.

The proposal is to use:

A comma (,) for or statements
A plus (+) for and statements.

For example the following filter c[OBS_STATUS]=B+M,A would mean that:

OBS_STATUS is an attribute that supports multiple values
We want to retrieve attributes for which OBS_STATUS is either the single value A or the combination of B and M.

Examples

Key-based vs. cube-based vs. combined queries:

https://ws-entry-point/data/dataflow/ECB/EXR/latest?key=M.CHF..,M.GBP.. https://ws-entry-point/data/dataflow/ECB/EXR/latest?c[FREQ]=M&c[CURRENCY]=CHF,GBP https://ws-entry-point/data/dataflow/ECB/EXR/latest?key=M.CHF..,M.GBP..&c[OBS_STATUS]=N

Querying across all structures

https://ws-entry-point/data/all/all/all/latest?c[REF_AREA]=CH,GR,UK

Using operators

Retrieve all inflation data for the food and non-alcoholic beverages category (code starting with 01) for Germany (DE), starting from 2015 and after.

https://ws-entry-point/data/dataflow/ESTAT/ICP?c[REF_AREA]=DE&c[ICP_ITEM]=sw:01&c[TIME_PERIOD]=ge:2015

sdmx-twg / sdmx-rest