Closed lucernae closed 4 years ago
@NyakudyaA for step number 2, I'm not really sure is mapping.yml
in this repo is current. Please also update it if it's not. It's on docker-osm/indonesia-buildings/docker-osm-settings
.
I can review no. 7 (fbf_functions) to make it easier to maintain, but the other step relies on your knowledge. You can proceed iteratively, and in parallel I can create travis job that tries to restore this schema to test the setup. If these schema is in place we can create docker image that can be put in rancher to deploy migration changes after every repo update.
We need a good way to store staging schema so everyone in the team have a documented process on how to apply changes/replicate the schema to local instance for future issues. In order to do that let's dump the schema first in a more organized way.
I propose several criterias here and a way to organize:
Criteria when dumping the schema
In order for it to be easily manageable and replicable, I propose the following conditions to be applied when making a schema dump
* Exclude specific instance details, such as user role, passwords, table grants. We want to have the schema easily restored in different instance * Exclude extension specific functions like st_ function from postgis. This needs to be done in database setup and not part of the staging schema * Each SQL schema should be in manageable chunk to read, understand, and organized. I propose we should separate the schema into multiple independent group that can be executed sequentially. Or even better, one sql schema for one table.
Per table will be a hassle to maintain. We should ideally have one SQL file that is logically arranged.
* The multiple sql files needs to have the ordering from the dependencies. For example, since `hazard_event` is a foreign key related in `hazard_areas` then we dump `00_hazard_event.sql` then `01_hazard_areas` so the order of creation is correct.
This can be done in a single file to prevent headache of trying to figure out which sql should be executed. example a table can be linked to five other tables etc.
* The sql statements has to be idempotent if executed as a whole. If the schema files were executed multiple times, the end result should still be the same. For example, restoring from `00_table_a` to `12_table_b` will produce the same database schema if it is executed again from `00_table_a` to `12_table_b` again. * index and primary key creation statement needs to be grouped in the same file with the table creations.
Agreed
- alter statements needs to be grouped after corresponding creation statements in the previous files
Agreed
- function creation statements needs to be grouped logically according to the purpose of the group. For example one sql file for spreadsheet generation functions, one other sql file for forecast generation functions.
Agreed
- trigger creation statements needs to be put at the end together after every schema and function is ready. Since it is probably always the last, we can put number 99 for the file (probably, CMIIW).
Agreed
for managablity and readability, we should be able to make logical group with folders if needed. For example we can make folder
01_osm_table_alter
with files inside:00_osm_buildings_alter.sql
,01_osm_roads_alter.sql
, etc. Then create another group like02_hazard_table_creation
with files inside00_hazard_event.sql
,01_hazard_areas.sql
, etcnote that these rules are only for the schema dump (without the data). Data fixtures dump ideally be done in separate pipeline after the database with the schema is prepared.
How we should proceed
Following my proposed criteria above. I propose the following procedures/step by step to finish the issue. It doesn't have to be done all at once, but we can do it iteratively and making sure it will finish step by step.
1. Let's put our schema dump under `fixtures/schema` folders. 2. Use single initial schema as `00_initial.sql`. Since we are based on `docker-osm` process. We dump the full table schema of the database after `imposm` finish without post-pbf-script.sql. It will include osm tables according to the `mapping.yml` being used (for now, typically `osm_buildings` and `osm_roads`). The goal of this file is so we can replicate docker-osm initial table schema without having to generate it from docker-osm pbf file.
Ok.
3. Put rules to alter osm tables according to our fbf need as `01_post_pbf_script` folders. I'm not familiar with this part so @NyakudyaA can clarify. Basically we need to add schema for administrative boundaries (village, sub_district, districts) and other table needed to calculate building/roads intersections. Also alter rules for osm_buildings and osm_roads columns to support that.
We need to load the (districts,sub_districts and villages) into the DB first. We can have a folder called admin_area
. This we can mount it and run it using PostgreSQL image.
4. We grouped our main fbf table in this step as folder `02_fbf_table`. This will include our main `hazard_` prefixed table and our old `flood_` prefixed tables if we have it. 5. We grouped our materialized view in this step as folder `03_fbf_mv`. This will include our mv caches table, typically tables that calculate building/roads summary statistics. 6. We grouped our view table in this step as folder `04_fbf_v`. This will include our view table, typically logic that calculates extents of boundaries or hazard. 7. We grouped our functions in this step as folder `05_fbf_f`. This will include our plpython3u functions such as spreadsheet generations and forecast scripts. We may have to make subfolders to group it further. 8. We grouped our database triggers as the last step. Since it will probably always the last, we can name it as `99_fbf_triggers`. @NyakudyaA please tell me if this is unreasonable.
All these files should be made as PR that we can review and test iteratively. I can make travis job that will try to run this sql so we can always know that it will be consistent to restore.
Per table will be a hassle to maintain. We should ideally have one SQL file that is logically arranged.
One single SQL is hard to separate, but separate SQL files can be easily joined into single file. My point is I want anyone who in the future is assigned to update the schema can easily locate which logical part he wants to change. I understand that it is going to be difficult to dump the schema separately initially for you. But it's also difficult for me to review and figure out which statements made the error in one single SQL files containing thousand lines. What I want to offer is to make it easy for anyone to change the schema whenever it is needed, so we can ease your burden. You can start dumping each table/logical sections bit by bit. It doesn't have to be finished in one go/single PR. I know you are busy with other project. If you tell us the dependencies we can collectively dump different table schema together. So, you don't have to do it all on your own.
However, can you elaborate what do you mean by Per table will be a hassle to maintain
, maybe it's something that I haven't think about yet?
We need to load the (districts,sub_districts and villages) into the DB first. We can have a folder called admin_area. This we can mount it and run it using PostgreSQL image.
Yes, I think you know the logical order the best. You can rearrange the order of the folders/scripts as you see fit. My order above is just suggestions. So the admin_area
is before we run docker-osm then? We can make admin_area
as the initial stat of the db, then 01_docker-osm
is the next checkpoint, then 02_post_pbf_script
for the osm tables alter columns. This is what I understood from your suggestions, CMIIW.
@lucernae Can we close this now?
@NyakudyaA have you review all the SQL fixtures in this folder?: https://github.com/inasafe/inasafe-fba/tree/master/fixtures/schema https://github.com/inasafe/inasafe-fba/tree/master/fixtures/data
Especially for roads related. If you think everything is in the repo, then you can close it.
It all looks good @lucernae
We need a good way to store staging schema so everyone in the team have a documented process on how to apply changes/replicate the schema to local instance for future issues. In order to do that let's dump the schema first in a more organized way.
I propose several criterias here and a way to organize:
Criteria when dumping the schema
In order for it to be easily manageable and replicable, I propose the following conditions to be applied when making a schema dump
hazard_event
is a foreign key related inhazard_areas
then we dump00_hazard_event.sql
then01_hazard_areas
so the order of creation is correct.00_table_a
to12_table_b
will produce the same database schema if it is executed again from00_table_a
to12_table_b
again.01_osm_table_alter
with files inside:00_osm_buildings_alter.sql
,01_osm_roads_alter.sql
, etc. Then create another group like02_hazard_table_creation
with files inside00_hazard_event.sql
,01_hazard_areas.sql
, etcHow we should proceed
Following my proposed criteria above. I propose the following procedures/step by step to finish the issue. It doesn't have to be done all at once, but we can do it iteratively and making sure it will finish step by step.
fixtures/schema
folders.00_initial.sql
. Since we are based ondocker-osm
process. We dump the full table schema of the database afterimposm
finish without post-pbf-script.sql. It will include osm tables according to themapping.yml
being used (for now, typicallyosm_buildings
andosm_roads
). The goal of this file is so we can replicate docker-osm initial table schema without having to generate it from docker-osm pbf file.01_post_pbf_script
folders. I'm not familiar with this part so @NyakudyaA can clarify. Basically we need to add schema for administrative boundaries (village, sub_district, districts) and other table needed to calculate building/roads intersections. Also alter rules for osm_buildings and osm_roads columns to support that.02_fbf_table
. This will include our mainhazard_
prefixed table and our oldflood_
prefixed tables if we have it.03_fbf_mv
. This will include our mv caches table, typically tables that calculate building/roads summary statistics.04_fbf_v
. This will include our view table, typically logic that calculates extents of boundaries or hazard.05_fbf_f
. This will include our plpython3u functions such as spreadsheet generations and forecast scripts. We may have to make subfolders to group it further.99_fbf_triggers
. @NyakudyaA please tell me if this is unreasonable.All these files should be made as PR that we can review and test iteratively. I can make travis job that will try to run this sql so we can always know that it will be consistent to restore.