Open hadrienk opened 7 years ago
Notes for further reflexion. Page 81 of the General description - the example given use a block to transform a liftable expression:
D1.Total + size(D2)
/* equivalent to */
{
V := size(D2)
D1.Total + V /* liftable */
}
/* lifting */
{
V := size(D2)
[D1] {
filter D1.Total is not null,
Total := D1.Total + V /* should be $V ? */
}
}
This raise the following questions:
{}
constructs be treated as a simple statement?[datasetExpr] { clause, clause* }
expressed as [datasetExpre] statement
?datasetExpr, datasetExpr
is the join operator?sum(ds1.x), avg(ds1.y) group by z
be lifted the same way?
The syntax described by the version 1.1 of the VTL specification lacks functionalities essential to a practical usage of the language. The aggregate form is not very consistent with the join statement and does not play well with the concept of "expression lifting" the user manual describes. Additionally, the impossibility to use more than one aggregation function per expression dictates the use of workaround such as:
The following proposals aim to solve those problems by extending the VTL Syntax. The need to be able to aggregate using several functions on several components could be solved by reusing the join with body construct:
The consistent use of the
group
keyword makes it easier to recognize that the expression is an aggregation whereas the prefixesby comp1, comp2
andalong comp1, comp2
expresses clearly the intent and provide an easy way to select the components.The
with hierarchy ds3 on comp1, comp2
makes it possible to use another dataset as a graph representing the groups we want to aggregate upon. This construct might be hard to implement if the graph includes signs as it is proposed by the VTL 1.1 specification but it makes it easier to apprehend for the language users since it can be related to other forms of aggregations.Another syntax could be
with hierarchy(ds3) on comp1, comp2
. This leaves the possibility to use statically defined aggregation rule/hierarchies and makes it clear that we are transforming a dataset so that it becomes an aggregation rule/hierarchies.Using an aggregate form of the join syntax would then mean that variables accessible within the scope of the aggregation body are different. One possibility is that the components become lists instead of a scalars. Provided that we implement some special list/set functions, the aggregation function would become regular VTL functions:
Another approach is to make all aggregated components a dataset with one measure, allowing deeper constructs. It has the benefit of building upon the normal aggregation functions that are defined by the specification and the concept of lifting :
Regular join and aggregation could be used together:
The other clauses would work the same way: