Where are refinery queries stored?

wmeyers commented 11 years ago

One of the outputs from the Research Dataset Restructuring is the set of queries for the graph. Where do these get stored?

antw commented 11 years ago

I'll include this in the documentation I write for #8. It was my plan to do that today, but the ETFlex introduction will probably take a while, so it might be tomorrow before I get around to writing this stuff up.

Is that okay, or do you need something immediately?

The super-quick version is: in the query section of node and edge documents in the ETSource ./data directory. However, as these documents are frequently re-imported (deleted and recreated), we're placing them in CSV files temporarily – in ./data/import – and the import script will add them to the documents automatically.

wmeyers commented 11 years ago

I don't need something right now, so please first finish ETFlex. Actually, I think your really short explanation already helped me a lot!

Can you update this issue when the documentation is there?

wmeyers commented 11 years ago

@ChaelKruip is starting with a naming scheme for these queries on Thursday. We want to make sure that all the things that are queried have a logical and understandable name. @antw can you make sure that the documentation on these things are clear by that time?

Important questions to be answered:

How does Refinery know where to get a certain value?
It knows how to query EB, CHP, etc. How does it know where to find each file?
Is the name of the cvs's relevant?

@ChaelKruip please add your questions

antw commented 11 years ago

I have expanded the documentation of the functions available in ETSource queries. I think that should answer your questions about EB, CHP, SHARE, data.

I'm reluctant to describe in too much detail how to set queries in the ActiveDocument (".ad") files because:

It will be subject to change in the coming days; and,
Until we make the switch from InputExcel to the ETSource stack, the queries are actually defined in separate CSV files (at ./data/import), and are then added to the .ad files with an import script.

The process

vlcsnap-2013-07-17-13h44m48s71

Come on, be serious...

The Atlas library (previously called "Tome") reads the ETSource files which define all the nodes, edges, etc.
Atlas then builds a graph structure with Turbine.
Any documents in ETSource which contain a query (for a demand or share) have this query executed. The resulting value is set on the graph created in step 2.
The graph is then handed to Refinery to fill in the blanks.

Setting a node demand

Let's look at how to set "agriculture_final_demand_network_gas" demand in the agriculture subgraph. This query is found in "./data/import/agriculture_queries.csv". Inside that file are all the queries for the nodes in the agriculture graph, with three columns:

status – Tells the import script whether to import the query or ignore it. This must be set to necessary if you want the query to be used.
converter key – The key of the node whose query you want to define.
query The query itself.

The final result (in Excel) looks like this:

+-----------+--------------------------------------+-----------------------------------------+
| status    | converter_key                        | query                                   |
│ necessary │ agriculture_final_demand_network_gas │ EB("agriculture/forestry", natural_gas) |
| ...       | ...                                  | ...                                     |
+-----------+--------------------------------------+-----------------------------------------+

... and that's it! When we next re-import the nodes into the ActiveDocument format, that query will be automatically found and added to the node document. The query for this node is very simple: it asks the energy balance data for the value of the cell in the "agriculture/forestry" row (use), and the "natural gas" column (carrier).

When we calculate the graph, this query is executed, and the resulting value is then given to Refinery.

Setting an edge share

Like nodes, edges also have queries. Unlike nodes, this query can set one of three attributes: the parent_share (default) aka "output_share", child_share aka "input_share", or demand.

             +------------+   +------------+
(demand: 75) | SUPPLIER 1 |   | SUPPLIER 2 | (demand: 25)
             +------------+   +------------+                   |
(parent_share: 1.0)    \          /  (parent_share: 1.0)       |
                        \        /                             | energy flow
    (child_share: 0.75)  \      /  (child_share: 0.25)         |
                          v    v                               v
                       +----------+
         (demand: 100) | CONSUMER |
                       +----------+

CSVs containing queries for edges have these columns:

from – The key of the node from which energy flows (aka "parent", "output", "supplier", "right node").
to – The key of the node to which energy flows (aka "child", "input", "consumer", "left node").
carrier – The edge carrier.
attribute – Which attribute is set when running the query. Either parent_share, child_share, or demand.
query – The query which will be executed.

+-------------------------------------------------+--------------------------------------+-------------+--------------+-------------------------------+
| from                                            | to                                   | carrier     | attribute    | query                         |
│ agriculture_heatpump_water_water_ts_electricity │ agriculture_final_demand_electricity | electricity | parent_share | SHARE(electricity, heat_pump) |
| ...                                             | ...                                  | ...         | ...          | ...                           |
+-------------------------------------------------+--------------------------------------+-------------+--------------+-------------------------------+

This is another simple query; it uses the SHARE() function to take a value out of the electricity.csv file, without doing any additional calculations.

Setting a slot share (conversion)

Not yet supported. Right now we're using the conversions which InputExcel calculates (in the old YAML files). Once quintel/atlas#4 is done, we will be able to use queries to set conversions also. For slot shares which are identical in every region, and can be entered by hand, this is very easy. But as those values will be destroyed the next time we re-import the documents, I suggest for now you put them into a new CSV file in "./data/import" and I'll come up with something more permanent after quintel/atlas#4.

Adding queries for new sectors

You can add new CSV files to "./data/import" containing queries for other sectors. They will be picked up automatically by the import script; the files can be named whatever you want, but sticking to the current convention would be best.

Remember that CSVs containing node demands need the headers status,converter_key,query, and CSVs for edges require from,to,carrier,attribute,query.

Repeating myself

I just want to say again that putting queries in CSV files is temporary, and in the future you'll edit them directly in the document files.

@wmeyers If you have any questions, or something is unclear, let me know. @dennisschoenmakers Did I forget anything? :smiley:

wmeyers commented 11 years ago

@antw great explanation! Thanks :-) I hope I did everything right for the agriculture analysis.

quintel / refinery