Dump the results from daylight analysis in a sqlite database

mostaphaRoudsari commented 6 years ago

why? aka what is wrong with the current workflow?

There are currently two mechanisms to input the results of a daylight analysis:

In the initial design which is currently used for most of the recipes other than AnnualDaylight the results are imported for each AnalysisPoint from every window group, for each time step and separate for total and direct contribution. In other words if there are 3 window groups in a scene for a daylight coefficient study each point will have 3 (for each window group) 8760 (for each hour) 2 (direct vs total) integer values associated to it. Now add a couple of dynamic blinds and a study with 2000 test points and you can imagine it will be a lot of data. Inside Python itself that's no problem. It brings a great level of flexibility for post processing and applications like blind controls, lighting controls, etc, but when you load these data from inside Grasshopper and or Dynamo it slows down the whole interface for reasons which is out of my control.
The reality is that most of Honeybee users don't really need all those data for post-processing and what they really care about is a quick calculation of annual metrics such as Daylight Autonomy. For this reason Honeybee has an alternative solution which reads the results from the results file, calculates the annual metric and keeps none of the initial data. This workflow addresses the issue with slowing down the process but limits the level of flexibility of post-processing data. I currently limited this method to a single window group so I didn't have to access several files for calculating annual metrics. Opening and closing multiple files thanks to iterators is very easy to do but is not the most efficient way to access the data. Specially when you're dealing with gigabytes of values.

What if we keep the values in the file but point each analysis point to the correct byte to get the results?

This is something that I originally tested before implementing the current design and realized that it can easily fail when you start moving between operating systems. Also any changes in the results files by user will break the whole idea of being efficient as we have to redo all the remapping of bytes to results and now think about having a number of analysis grids and dynamic blinds to understand what kind of remapping can wait for you after a simple edit.

New proposal

The new proposal is to dump the results from the analysis into a sqlite database and rewrite all the post-processing functionalities as sql queries. sqIite3 comes with the Python installation which is helpful to avoid an extra dependency and installation issues.

I tested the idea during the weekend but didn't finished it. It will result in several edits in AnalysisPoint and AnalysisGrid classes and we need to make changes in the recipes to reverse the current results structure.

This workflow adds an extra time for creating the data base for total and direct-sun values but then saves us a lot of time for post-processing the results.

There will be a separate table for each result file where rows are hours of the year and columns are results for each points. For studies with single window group the analysis is pretty straight forward while for studies with several window groups and blind states one can use JOIN the tables in a single query. Table: windowgroup..blindstate..total

.	pt_1	pt_2	pt_3	...
hour1	0	0	0	...
hour2	0	0	0	...
hour3	0	0	0	...

One main critique to this structure will be the change in the number of rows based on the number of test points but for what we need that will be fine specially if it is well documented. We will also include methods to generate the original radiance files from the database in case one needs them afterwards for any reason.

mostaphaRoudsari commented 6 years ago

Ended up with 5 tables and 3 junction tables to map sensors to grids and sources/window-groups to grids.

# projects table. In sqlite there will always be a single project
# this is neccassary to keep the schema for docker and local database the same.
CREATE TABLE IF NOT EXISTS Project (
          id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
          name TEXT NOT NULL,
          created_at TIMESTAMP NOT NULL,
          values_loaded INTEGER NOT NULL,
          dir_values_loaded INTEGER NOT NULL
          );

# sensors and analysis grids.
CREATE TABLE IF NOT EXISTS Sensor (
          id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
          loc_x REAL NOT NULL,
          loc_y REAL NOT NULL,
          loc_z REAL NOT NULL,
          dir_x REAL NOT NULL,
          dir_y REAL NOT NULL,
          dir_z REAL NOT NULL
          );

CREATE TABLE IF NOT EXISTS Grid (
          id INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE,
          name TEXT,
          project_id INTEGER,
          FOREIGN KEY (project_id) REFERENCES Project(id)
          );

CREATE TABLE IF NOT EXISTS SensorGrid (
          sensor_id INTEGER,
          grid_id INTEGER,
          FOREIGN KEY (sensor_id) REFERENCES Sensor(id),
          FOREIGN KEY (grid_id) REFERENCES Grid(id)
          );

# light sources
CREATE TABLE IF NOT EXISTS Source (
          id INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE,
          name TEXT
          );

CREATE TABLE IF NOT EXISTS State (
          id INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE,
          name TEXT
          );

CREATE TABLE IF NOT EXISTS SourceState (
          source_id INTEGER,
          state_id INTEGER,
          FOREIGN KEY (source_id) REFERENCES Source(id),
          FOREIGN KEY (state_id) REFERENCES State(id)
          );

# light sources and analysis grids relationship
CREATE TABLE IF NOT EXISTS SourceGrid (
          source_id INTEGER NOT NULL,
          grid_id INTEGER NOT NULL,
          FOREIGN KEY (source_id) REFERENCES Source(id),
          FOREIGN KEY (grid_id) REFERENCES Grid(id)
          );

# daylight analysis results
CREATE TABLE IF NOT EXISTS Result (
          sensor_id INTEGER NOT NULL,
          source_id INTEGER NOT NULL,
          state_id INTEGER NOT NULL,
          hoy REAL NOT NULL,
          sky INTEGER,
          direct INTEGER,
          sun INTEGER,
          FOREIGN KEY (sensor_id) REFERENCES Sensor(id),
          FOREIGN KEY (source_id) REFERENCES Source(id),
          FOREIGN KEY (state_id) REFERENCES State(id),
          CONSTRAINT result_id PRIMARY KEY (sensor_id, source_id, state_id, hoy)
          );

chriswmackey commented 6 years ago

I just wanted to say that I fully support putting this into the core library even with the inefficiencies of the sqlite3 library in IronPython. As a case in point, I offer my memory usage after a HB+ 3-phase study with 1,000 points:

ladybug-tools / honeybee