Change working of V() command in GQL

kaskranenburgQ commented 1 month ago

The "V()" command is the most used command for querying data from the energy graph. It's documentation can be found here.

At the moment the V() argument does not give any feedback about whether the given node actually exists within the code base. For instance, if we try look at the query wet_biomass_to_export_in_biomass_sankey, which is currently used in the biomass sankey, we see the following:

`- unit = PJ

query = DIVIDE( V(energy_export_wet_biomass, demand), BILLIONS ) ` The node that is referenced does not exist anymore in our active codebase. If we were to query this we see the following:

The same happens while directly querying the query that is mentioned in the biomass sankey.

While I could argue that the demand of the given 'element' is indeed zero, I don't think this information is of interest to us. Therefore, I propose that we rewrite the V() command so that when an incorrect nodename is given to a query, the query returns 'nil', and not zero.

This will give direct feedback to the modellers when they are working on graph alterations.

The fallout of this will be large, because a lot of outdated query names are used in the output elements at the moment. These output elements will al break when we update the V() command in a direct way.

I would like to hear your opinion on this matter @kndehaan @mabijkerk @noracato

kndehaan commented 1 month ago

Sounds like very useful feedback to me if we rewrite the V() command, for future model improvements as well as for cleaning up outdated or malfunctioning queries. I don't know if it's easy to change the V() command? As you mentioned, the implications could be quite large so there should also be enough time budgeted to fix the fallout.

kaskranenburgQ commented 1 month ago

In this PR I changed the working of V().

However, there is no change if I query wet_biomass_to_export_in_biomass_sankey:

While without divide the query returns 'nil':

It seems that PRODUCT, DIVIDE, SUM, ignore queries that return nil, as is stated in the documentation

EDIT: While investigating the feasibility of changing the PRODUCT, DIVIDE and SUM functions, I found the following: The query final_demand_of_coal_and_coal_gas_in_other_non_specified_industry_energetic is setup in a robust manner:

SUM(
  V(
    FILTER(
      FILTER(
        INTERSECTION(
          EG(final_demand),EG(other_non_specified_industry)
        ),
        energetic?
      ),
      coal? 
    ),
    value
  ),
  V(
    FILTER(
      FILTER(
        INTERSECTION(
          EG(final_demand),EG(other_non_specified_industry)
        ),
        energetic?
      ),
      coal_gas?
    ),
    value
  ),
) / BILLIONS

However, there is no usage of coal_gas in the INTERSECTION that is made, so the second part of this query returns 'nil' at the moment. If we were to change the working of SUM(), DIVIDE() and PRODUCT() a lot of queries that are setup in a robust manner would fail. However, in the case of the query above, a shorter written version would work since the summation happens inside the V() function and not with a V() function:

  V(
    FILTER(
      FILTER(
        INTERSECTION(
          EG(final_demand),EG(other_non_specified_industry)
        ),
        energetic?
      ),
      coal? || coal_gas?
    ),
    value
  )/ BILLIONS

There are multiple ways to move forward, here is my proposal:

Re-install a INVALID_TO_ZERO function which can be used to write robust queries.
Rewrite the queries that come up to the shorter version.

mabijkerk commented 1 month ago

I get where you are coming from @kaskranenburgQ. I have a few thoughts though:

Will a value of "nil" help us to spot an error more quickly? Will charts/dashboard elements/other calculations break when "nil" is returned?
Can we imagine use cases where it is useful to have 0 returned? For example, if we want to query the sum of e.g. typical_input_capacity * number_of_units of a set of nodes. Then it might be fine that the nodes in that group that have no typical_input_capacity are processed and do not produce an error. There may be other use case @noracato?
I would then rather build another function that is explicitly different from V(), which would then be used in the specific cases that we want to make sure that all values are returned.

In the future, I would suggest that we finish the discussion on whether or not we want to implement a feature before working on the implementation. Especially for functions that are this fundamental, a lot of fallout can be expected: queries need to be changed, tests (re)written. We therefore really need to be sure that we want to implement it and how we want to implement it.

quintel / etengine

Change working of V() command in GQL #1433