sdmx-twg / sdmx-rest

This repository is used for maintaining the SDMX RESTful web services specification.
111 stars 24 forks source link

029-Improving data queries #117

Closed sosna closed 3 years ago

sosna commented 4 years ago

Data queries need to be improved for SDMX 3.0. In a nutshell, it is proposed to

Add support for multiple keys

Currently, the key parameter only support one (possibly partial) key. The + operator can be used to supply more than one value for any dimension. This works well if the Cartesian product of the dimensions where the + operator has been used, represents what you want. If not, having the option of supplying multiple partial keys would work better but this is currently not supported.

For example, let’s imagine a fictive inflation DSD made of the following dimensions: FREQ, REF_AREA, INFLATION_ITEM, SOURCE, TYPE. Let’s imagine that there are several values for type, such as the index (INX), the weights (INW), the annual rate of change (ANR) and the contribution to growth (CTG). Let’s say that source A supplies data for all 4 types and source B only for 2 (ANR & CTG). You want INX and ANR from source A and CTG from source B.

The current API would allow a query like the following: M…A+B.ANR+INX+CTG.

However, this is not what we want, as it would return ANR and CTG data for both sources A & B. It is therefore proposed to allow using a comma, to separate (possibly partial) keys: M…A.ANR,M…A.INX,M…B.CTG

Extend the context of data retrieval

The first path parameter of the current data queries holds a reference to the dataflow of the data to be returned. It must resolve to one single artefact.

It is proposed to modify this parameter to:

By doing this, we can:

It is proposed to use the same path parameters as for the structure and schema queries, thereby aligning all query types.

To retrieve all data structured according to the latest version of the ECB_EXR1 DSD maintained by the ECB, the following query could be performed: https://ws-entry-point/data/datastructure/ECB/ECB_EXR1/latest

Add a “cube-based” data retrieval, in addition to the current “key-based” one

The current API requires knowledge of the series key. This can be tedious in some cases, even more so in case of DSDs with many dimensions. It is therefore proposed to add support for a cube-based filtering mechanism.

To reuse the fictive DSD mentioned above, in case you want to retrieve all public inflation data about Switzerland (i.e. neither confidential, nor restricted), the following would be enough: https://ws-entry-point/data/dataflow/ESTAT/ICP?c[REF_AREA]=CH&c[CONF_STATUS]=F

This mechanism could support multiple values. For example, in case you want to retrieve all public inflation data about Switzerland and Germany, the following would be enough: https://ws-entry-point/data/dataflow/ESTAT/ICP?c[REF_AREA]=CH,DE&c[CONF_STATUS]=F

Furthermore, support for operators could be introduced. For example, to retrieve all inflation data about Switzerland and Germany, for reporting periods in 2018 or above the following would be enough: https://ws-entry-point/data/dataflow/ESTAT/ICP?c[REF_AREA]=CH,DE&c[TIME_PERIOD]=ge:2018

When the operator is not specified, and there is only one value, it would default to meaning “equals to”. When the operator is not specified, and there are multiple values, it would default to meaning “or”.

It is proposed to start with the following operators:

Operator Meaning
eq Equals
ne Not equal to
lt Less than
le Less than or equal to
gt Greater than
ge Greater than or equal to
co Contains
nc Does not contain
sw Starts with
ew Ends with
nd And
or Or

If cube-based filters are introduced, the startPeriod and endPeriod query parameters are no longer needed and may be removed from the API.

The previous key-based query mechanism would continue to be supported by introducing a new special parameter for key based queries: https://ws-entry-point/data/dataflow/ESTAT/ICP?key=M…A.ANR,M…A.INX,M…B.CTG

Obviously, it should be possible to combine both key-based and cube-based filtering mechanisms: https://ws-entry-point/data/dataflow/ESTAT/ICP?key=M…A.ANR,M…A.INX,M…B.CTG&c[OBS_STATUS]=N.

Harmonizing separators

Although potentially painful, we propose to take the opportunity offered by SDMX 3.0 to harmonize how separators are used.

The proposal is to use:

For example the following filter c[OBS_STATUS]=B+M,A would mean that:

Examples

Key-based vs. cube-based vs. combined queries:

https://ws-entry-point/data/dataflow/ECB/EXR/latest?key=M.CHF..,M.GBP.. https://ws-entry-point/data/dataflow/ECB/EXR/latest?c[FREQ]=M&c[CURRENCY]=CHF,GBP https://ws-entry-point/data/dataflow/ECB/EXR/latest?key=M.CHF..,M.GBP..&c[OBS_STATUS]=N

Querying across all structures

https://ws-entry-point/data/all/all/all/latest?c[REF_AREA]=CH,GR,UK

Using operators

Retrieve all inflation data for the food and non-alcoholic beverages category (code starting with 01) for Germany (DE), starting from 2015 and after.

https://ws-entry-point/data/dataflow/ESTAT/ICP?c[REF_AREA]=DE&c[ICP_ITEM]=sw:01&c[TIME_PERIOD]=ge:2015

sosna commented 4 years ago

The impact on availability queries must still be assessed. The 2 queries must be aligned.

We must support the possibility to check whether there are data for a particular code (say, a country code), regardless of the underlying DSD(s).