lucernae commented 4 years ago

We need a good way to store staging schema so everyone in the team have a documented process on how to apply changes/replicate the schema to local instance for future issues. In order to do that let's dump the schema first in a more organized way.

I propose several criterias here and a way to organize:

Criteria when dumping the schema

In order for it to be easily manageable and replicable, I propose the following conditions to be applied when making a schema dump

Exclude specific instance details, such as user role, passwords, table grants. We want to have the schema easily restored in different instance
Exclude extension specific functions like st_ function from postgis. This needs to be done in database setup and not part of the staging schema
Each SQL schema should be in manageable chunk to read, understand, and organized. I propose we should separate the schema into multiple independent group that can be executed sequentially. Or even better, one sql schema for one table.
The multiple sql files needs to have the ordering from the dependencies. For example, since hazard_event is a foreign key related in hazard_areas then we dump 00_hazard_event.sql then 01_hazard_areas so the order of creation is correct.
The sql statements has to be idempotent if executed as a whole. If the schema files were executed multiple times, the end result should still be the same. For example, restoring from 00_table_a to 12_table_b will produce the same database schema if it is executed again from 00_table_a to 12_table_b again.
index and primary key creation statement needs to be grouped in the same file with the table creations.
alter statements needs to be grouped after corresponding creation statements in the previous files
function creation statements needs to be grouped logically according to the purpose of the group. For example one sql file for spreadsheet generation functions, one other sql file for forecast generation functions.
trigger creation statements needs to be put at the end together after every schema and function is ready. Since it is probably always the last, we can put number 99 for the file (probably, CMIIW).
for managablity and readability, we should be able to make logical group with folders if needed. For example we can make folder 01_osm_table_alter with files inside: 00_osm_buildings_alter.sql, 01_osm_roads_alter.sql, etc. Then create another group like 02_hazard_table_creation with files inside 00_hazard_event.sql, 01_hazard_areas.sql, etc
note that these rules are only for the schema dump (without the data). Data fixtures dump ideally be done in separate pipeline after the database with the schema is prepared.

How we should proceed

Following my proposed criteria above. I propose the following procedures/step by step to finish the issue. It doesn't have to be done all at once, but we can do it iteratively and making sure it will finish step by step.

Let's put our schema dump under fixtures/schema folders.
Use single initial schema as 00_initial.sql. Since we are based on docker-osm process. We dump the full table schema of the database after imposm finish without post-pbf-script.sql. It will include osm tables according to the mapping.yml being used (for now, typically osm_buildings and osm_roads). The goal of this file is so we can replicate docker-osm initial table schema without having to generate it from docker-osm pbf file.
Put rules to alter osm tables according to our fbf need as 01_post_pbf_script folders. I'm not familiar with this part so @NyakudyaA can clarify. Basically we need to add schema for administrative boundaries (village, sub_district, districts) and other table needed to calculate building/roads intersections. Also alter rules for osm_buildings and osm_roads columns to support that.
We grouped our main fbf table in this step as folder 02_fbf_table. This will include our main hazard_ prefixed table and our old flood_ prefixed tables if we have it.
We grouped our materialized view in this step as folder 03_fbf_mv. This will include our mv caches table, typically tables that calculate building/roads summary statistics.
We grouped our view table in this step as folder 04_fbf_v. This will include our view table, typically logic that calculates extents of boundaries or hazard.
We grouped our functions in this step as folder 05_fbf_f. This will include our plpython3u functions such as spreadsheet generations and forecast scripts. We may have to make subfolders to group it further.
We grouped our database triggers as the last step. Since it will probably always the last, we can name it as 99_fbf_triggers. @NyakudyaA please tell me if this is unreasonable.

All these files should be made as PR that we can review and test iteratively. I can make travis job that will try to run this sql so we can always know that it will be consistent to restore.

lucernae commented 4 years ago

@NyakudyaA for step number 2, I'm not really sure is mapping.yml in this repo is current. Please also update it if it's not. It's on docker-osm/indonesia-buildings/docker-osm-settings. I can review no. 7 (fbf_functions) to make it easier to maintain, but the other step relies on your knowledge. You can proceed iteratively, and in parallel I can create travis job that tries to restore this schema to test the setup. If these schema is in place we can create docker image that can be put in rancher to deploy migration changes after every repo update.

NyakudyaA commented 4 years ago

We need a good way to store staging schema so everyone in the team have a documented process on how to apply changes/replicate the schema to local instance for future issues. In order to do that let's dump the schema first in a more organized way.

I propose several criterias here and a way to organize:

Criteria when dumping the schema

In order for it to be easily manageable and replicable, I propose the following conditions to be applied when making a schema dump
* Exclude specific instance details, such as user role, passwords, table grants. We want to have the schema easily restored in different instance

* Exclude extension specific functions like st_ function from postgis. This needs to be done in database setup and not part of the staging schema

* Each SQL schema should be in manageable chunk to read, understand, and organized. I propose we should separate the schema into multiple independent group that can be executed sequentially. Or even better, one sql schema for one table.

Per table will be a hassle to maintain. We should ideally have one SQL file that is logically arranged.

* The multiple sql files needs to have the ordering from the dependencies. For example, since `hazard_event` is a foreign key related in `hazard_areas` then we dump `00_hazard_event.sql` then `01_hazard_areas` so the order of creation is correct.

This can be done in a single file to prevent headache of trying to figure out which sql should be executed. example a table can be linked to five other tables etc.

* The sql statements has to be idempotent if executed as a whole. If the schema files were executed multiple times, the end result should still be the same. For example, restoring from `00_table_a` to `12_table_b` will produce the same database schema if it is executed again from `00_table_a` to `12_table_b` again.

* index and primary key creation statement needs to be grouped in the same file with the table creations.
Agreed

alter statements needs to be grouped after corresponding creation statements in the previous files

Agreed

function creation statements needs to be grouped logically according to the purpose of the group. For example one sql file for spreadsheet generation functions, one other sql file for forecast generation functions.

Agreed

trigger creation statements needs to be put at the end together after every schema and function is ready. Since it is probably always the last, we can put number 99 for the file (probably, CMIIW).

Agreed

for managablity and readability, we should be able to make logical group with folders if needed. For example we can make folder 01_osm_table_alter with files inside: 00_osm_buildings_alter.sql, 01_osm_roads_alter.sql, etc. Then create another group like 02_hazard_table_creation with files inside 00_hazard_event.sql, 01_hazard_areas.sql, etc

note that these rules are only for the schema dump (without the data). Data fixtures dump ideally be done in separate pipeline after the database with the schema is prepared.

How we should proceed

Following my proposed criteria above. I propose the following procedures/step by step to finish the issue. It doesn't have to be done all at once, but we can do it iteratively and making sure it will finish step by step.
1. Let's put our schema dump under `fixtures/schema` folders.

2. Use single initial schema as `00_initial.sql`. Since we are based on `docker-osm` process. We dump the full table schema of the database after `imposm` finish without post-pbf-script.sql. It will include osm tables according to the `mapping.yml` being used (for now, typically `osm_buildings` and `osm_roads`). The goal of this file is so we can replicate docker-osm initial table schema without having to generate it from docker-osm pbf file.

Ok.

3. Put rules to alter osm tables according to our fbf need as `01_post_pbf_script` folders. I'm not familiar with this part so @NyakudyaA can clarify. Basically we need to add schema for administrative boundaries (village, sub_district, districts) and other table needed to calculate building/roads intersections. Also alter rules for osm_buildings and osm_roads columns to support that.

We need to load the (districts,sub_districts and villages) into the DB first. We can have a folder called admin_area. This we can mount it and run it using PostgreSQL image.

4. We grouped our main fbf table in this step as folder `02_fbf_table`. This will include our main `hazard_` prefixed table and our old `flood_` prefixed tables if we have it.

5. We grouped our materialized view in this step as folder `03_fbf_mv`. This will include our mv caches table, typically tables that calculate building/roads summary statistics.

6. We grouped our view table in this step as folder `04_fbf_v`. This will include our view table, typically logic that calculates extents of boundaries or hazard.

7. We grouped our functions in this step as folder `05_fbf_f`. This will include our plpython3u functions such as spreadsheet generations and forecast scripts. We may have to make subfolders to group it further.

8. We grouped our database triggers as the last step. Since it will probably always the last, we can name it as `99_fbf_triggers`. @NyakudyaA please tell me if this is unreasonable.

All these files should be made as PR that we can review and test iteratively. I can make travis job that will try to run this sql so we can always know that it will be consistent to restore.

lucernae commented 4 years ago

Per table will be a hassle to maintain. We should ideally have one SQL file that is logically arranged.

One single SQL is hard to separate, but separate SQL files can be easily joined into single file. My point is I want anyone who in the future is assigned to update the schema can easily locate which logical part he wants to change. I understand that it is going to be difficult to dump the schema separately initially for you. But it's also difficult for me to review and figure out which statements made the error in one single SQL files containing thousand lines. What I want to offer is to make it easy for anyone to change the schema whenever it is needed, so we can ease your burden. You can start dumping each table/logical sections bit by bit. It doesn't have to be finished in one go/single PR. I know you are busy with other project. If you tell us the dependencies we can collectively dump different table schema together. So, you don't have to do it all on your own.

However, can you elaborate what do you mean by Per table will be a hassle to maintain, maybe it's something that I haven't think about yet?

We need to load the (districts,sub_districts and villages) into the DB first. We can have a folder called admin_area. This we can mount it and run it using PostgreSQL image.

Yes, I think you know the logical order the best. You can rearrange the order of the folders/scripts as you see fit. My order above is just suggestions. So the admin_area is before we run docker-osm then? We can make admin_area as the initial stat of the db, then 01_docker-osm is the next checkpoint, then 02_post_pbf_script for the osm tables alter columns. This is what I understood from your suggestions, CMIIW.

NyakudyaA commented 4 years ago

@lucernae Can we close this now?

lucernae commented 4 years ago

@NyakudyaA have you review all the SQL fixtures in this folder?: https://github.com/inasafe/inasafe-fba/tree/master/fixtures/schema https://github.com/inasafe/inasafe-fba/tree/master/fixtures/data

Especially for roads related. If you think everything is in the repo, then you can close it.

NyakudyaA commented 4 years ago

It all looks good @lucernae

inasafe / inasafe-fba

Document/dump the staging schema to the repo #42

Criteria when dumping the schema

How we should proceed

Criteria when dumping the schema

How we should proceed