qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
117 stars 37 forks source link

QEP 59: Aggregate functions for QGIS expressions #59

Open nyalldawson opened 8 years ago

nyalldawson commented 8 years ago

QGIS Enhancement 59 (was 40): Aggregate functions for QGIS expressions

Date 2016/04/29 Author Nyall Dawson (@nyalldawson) Contact nyall dot dawson at gmail dot com Version QGIS 2.16 Funding Kanton of Zug, Switzerland

Summary

This proposal covers implementing aggregate functions in the QGIS expression engine, allowing calculation of various statistical and summary values from the fields within a layer.

Aggregates will be implemented in two forms:

  1. Aggregate functions which operate on the values from the current layer, including the ability to calculate grouped and relational aggregates.
  2. A summary aggregate function which operates on all the values from a different layer.

    Form 1: Current layer aggregates

The first set of aggregate functions will operate within the current layer only. They will support both calculation of the aggregate for ALL features within the layer, as well as for matching grouped features, and features from one of the layer's child relations.

The aggregates will be implemented through the addition of multiple aggregate functions, each representing a single aggregate calculation. For instance, “sum”, “count”, “mean”, etc.

1A Calculation of statistics

Numeric aggregates will include all statistics currently calculated by QgsStatisticalSummary (ie, those which are available in the “statistical summary dock”), including:

New classes for calculating string aggregates (QgsStringStatisticalSummary) and date/time aggregates (QgsDateTimeStatisticalSummary) will be created. These will be designed in a reusable manner so that the calculations can be reused by other parts of QGIS code, for instance allowing string and date/time statistics in the statistical summary dock.

QgsStringStatisticalSummary will calculate:

QgsDateTimeStatisticalSummary will calculate:

Both QgsStringStatisticalSummary and QgsDateTimeStatisticalSummary will have 100% unit test coverage.

1B Aggregate function syntax

Each of the aggregate functions will accept the same parameters:

Aggregate calculation will be cached in short-term storage (within the passed QgsExpressionContext), so for instance the aggregate will only need to be calculated once for each unique group-by value for a single map render if used within data defined symbology, rather than once for every feature being rendered.

1E Relational aggregates

The aggregate functions will be accompanied by a specific aggregate_relation function, to simplify calculation of aggregates for related features. When using aggregate_relation the aggregate will be calculated using only the related child features from the matching layer relation.

Examples:

Note: when using aggregate_relation the calculation sub-expression will ONLY be allowed to reference fields from the child layer.

Form 2: Summary aggregate function

The second aggregate function operates on all the values from a different layer, returning a single summary value. This will be implemented by the addition of a new aggregate function. The syntax of the aggregate function will be:

aggregate('layer name or id', 'aggregate_type', expression to aggregate, optional filter expression)

Aggregate calculation will be cached in short-term storage (within the QgsExpressionContext), so for instance the aggregate will only need to be calculated once for each map render if used within data defined symbology rather than once for every feature being rendered. The calculation will be designed to utilised expression compilation where available, so that if possible the calculation of the aggregate will be delegated to the provider.

Relation to virtual layers

While it is acknowledged that there is also a need for calculation of aggregates within virtual layers, expressions are prevalent throughout all areas of QGIS (data defined symbology, labeling, field calculator, reporting using atlas based compositions, etc) and there is a defined need for calculation of aggregates within expressions.

Accordingly this QEP covers only the inclusion of aggregates within the expression engine and modifications to the virtual layer feature will be out of scope.

Backwards Compatibility

N/A

Votes

(required)

giohappy commented 8 years ago

Great work @nyalldawson ! That's what I suggested a couple of years ago, but it was too early ;)

mhugo commented 8 years ago

Hi, good to see some proposition to handle this need.

I am not sure to understand why the SQL syntax is not considered. Is it really too complex ? If I am correct, the two "forms" are needed because we have no way to select with join in expressions ? But then it introduces two distinct forms that seems to me less expressive than what is possible with SQL. So then we would have different dialects in QGIS to do very similar things ...

Just a thought, what about an evolution of the expression parser to tend to SQL. Something backward compatible with the current expression, but with an optional SELECT FROM, GROUP BY and with a new type of functions (aggregates) ?

modifications to the virtual layer feature will be out of scope.

Do you see modifications to the virtual layer feature that would be needed to handle aggregates ?

Actually, we could have a function sql("") that would call the virtual layer engine to compute aggregates in an expression ...

nyalldawson commented 8 years ago

@mhugo

Re SQL:

I've given long deliberation to this, but in the end reached the conclusion that it's a mistake to try to make the expression engine follow SQL syntax. Here's why:

That's why I fell back to the function-based approach. The entire expression engine has been designed around use of nested functions. And honestly, I think this approach makes sense for the use case of expressions, which is returning a single value result from an expression calculation. Trying to twist the expression engine too closely around SQL syntax is just going to result in a mess.

That's why I believe we've got 2 totally different use cases here:

  1. SQL based virtual tables (out of scope for this work)
  2. Use of aggregate calculation functions within expressions (what this QEP covers)

Hope that helps clarify!

nyalldawson commented 8 years ago

Updated, with the following changes based on @wonder-sk's feedback:

nyalldawson commented 8 years ago

Any further feedback on this? I'd like to get this work in for 2.16, and freeze is closing in. I've got an informal "looks good" from @wonder-sk (via hangouts), but am I good to go on this?

NathanW2 commented 8 years ago

+1 from me. I don't see anything bad with this and agree reinventing SQL syntax isn't for here. These can still be used in the virtual layers so good win I think.

nyalldawson commented 8 years ago

Implemented in https://github.com/qgis/QGIS/commit/307aabd66ac089cba9f2556028d0b81b328cc345

DelazJ commented 7 years ago

feature implemented so maybe, should we close this QEP

jmelomunicipia commented 7 years ago

I've been trying to use this feature but I'm not being able to make it work. You describe the feature as aggregate_relation but in QGIS 2.16 or 2.18 the feature is listed as relation_aggregate. I'm using this syntax: relation_aggregate(relation := 'tubo_et_face_et_inicial', aggregate := 'sum', expression := "compr_linear") to try to sum up all the values in the "compr_linear field of the child table, to a virtual field in the parent table. Is this syntax correct, or am I doing something wrong?

geodatup commented 7 years ago

good feature, but I am not be able to make it work on virtual layer. This work nicely on standard field.

nicolasmcfadden commented 6 years ago

Hello, I'm very sorry if this isn't the right place to post this but I think that my question could be a complimentary enhancement for the QEP so here I go... I've been using an aggregate function with a concatenate attribute (because I need access to the layer attribute that the normal concatenate function doesn't have) to build test sheets with an Atlas. However using the concatenate parameter of the doesn't allow use of a group_by function and the list i generate appears in a random order.

I had posted a ticket on gis stackexchange explaining my problem: https://gis.stackexchange.com/questions/268432/grouping-and-sorting-a-list-of-equipment-using-an-aggregate-function-in-the-prin

I haven't gotten an answer but thought that this could be an enhancement proposal to the aggregate function? I'm very sorry if this isn't the right place for posting this but I do love your work on QGIS3.

nyalldawson commented 6 years ago

@nicolasmcfadden

That enchantment is likely to come for 3.2

nicolasmcfadden commented 6 years ago

Ok great to know! Thx for all your work!

nicolasmcfadden commented 6 years ago

Hello guys, any news on this?

andreasneumann commented 6 years ago

I think we should really close this QEP. Any additional discussions and enhancements should be treated separately. The work of the original QEP has been implemented quite a while ago.

nicolasmcfadden commented 6 years ago

I definitely see your point but all other similar functions have a group by option, isn't this important for completing this QEP?

nicolasmcfadden commented 5 years ago

Hello everyone, So is this QEP closed or might a group_by functionality be added? Thx for your work

mauriciomarquezgoa commented 5 years ago

Hello, I'm just trying to make a group_by to this functionality and I can not find a way to do it. It would be interesting if they could add the possibility of doing the order by even. Thanks for the excellent work, greetings!

DelazJ commented 5 years ago

@mauriciomarquezgoa the order by is already implemented and available in nightly build if you want to test (https://github.com/qgis/QGIS/pull/9945). And I think group_by is already available, no? Anyway, I think discussion on features should be handled as feature request report and this QEP closed. @NathanW2 ?

M-Rick commented 4 years ago

@mauriciomarquezgoa I think group_by is already available, no?

Hi Well it doesn't seem to be. I have tried to use it and I get an error with it.

aggregate(
layer:='Flats',
aggregate:='sum',
group_by:="street",
expression:=IF("building_flats" IS NULL,0,to_real("building_flats"))
)

Erreurs de l'Analyseur: Invalid QgsExpressionFunction::Parameter name 'group_by' for aggregate

M-Rick commented 4 years ago

Any news for this?

Fertuesta25 commented 2 years ago

Is it possible to implement filtering in the relation_aggregate function?