inasafe / inasafe-realtime

Realtime logic for InaSAFE
2 stars 8 forks source link

Migrate old MMI Contours from InaSAFE 3.5 #203

Closed lucernae closed 5 years ago

lucernae commented 6 years ago

Problem

We need to convert the whole mmi contours of InaSAFE 3.5 into a postgis table for easy migration.

Plan

mmi contours migration

mas @ivanbusthomi , you only need to worry about the left side.

The task is described as follows:

  1. We need to clone mmi layers data from django media folders and saved it into the postgis database.
  2. The data is on the server: ssh realtime@realtime.inasafe.org -p 9223. You can go into the directory using ssh, sftp, or rsync (to copy the file across machines).
  3. Specific data for mmi contours is on this directory: /home/web/media/earthquake/mmi_output
  4. It contains lists of zipped file mmi-contours. Inside it is a shapefile. Copy these zipped files to your own machines. Test automations on your own machines, not from this folder. Spoilers: It's around 600MB size. You can copy it using rsync, and use the same rsync command whenever you want to sync these data from live server. Example: rsync -avz -e "ssh -p 9223" realtime@realtime.inasafe.org:/home/web/media/earthquake/mmi_output/ <folder_in_your_machine>
  5. Create postgis table to save the contours data:

Field explanations:

  1. Automate to extract all zip files, and ingests shapefile to save each polygon with corresponding mmi level and shake id into the database.

  2. This process needs to be executed easily whenever we want to update the database with new mmi contours.

CC @timlinux

lucernae commented 6 years ago

CC @ivanbusthomi

Basically what we want to do is for you is to populate the sql database with this information in each rows:

If you already have this database (you have done converting the shapefile to postgis for every shakemap). Please put your script somewhere on github (it could be a gist, or if you are familiar, you can create your own git repo for this) and also your database dump (we will call this Raw DB).

From your database dump, either you (@ivanbusthomi) or @gubuntu should be able to create an sql script to take the values from Raw DB and move it over to Realtime DB (the one that already have a fixed structure/schema).

This entire process needs to be scripted, so we can rerun it again at migration day and have the result quickly updated to be moved over to production database :D.

lucernae commented 6 years ago

And this is the sql schema of the MMI contour table for the upcoming database:

This schema is meant as a reference so you know which field is which, so it will help you to create the script for moving the data from Raw DB into Realtime DB.

Also, @ivanbusthomi @gubuntu please write Readme to describe the procedure of what should I do to use the script in production (the chain of command/script I need to make from inside the postgis containers).

-- auto-generated definition
create table realtime_earthquake
(
  id                   serial                   not null
    constraint realtime_earthquake_pkey
    primary key,
  shake_id             varchar(50)              not null,
  magnitude            double precision         not null,
  time                 timestamp with time zone not null,
  depth                double precision         not null,
  location             geometry(Point, 4326)    not null,
  location_description varchar(255)             not null,
  shake_grid           varchar(100),
  felt                 boolean                  not null,
  mmi_output           varchar(100),
  generated_time       timestamp with time zone,
  hazard_path          varchar(255),
  inasafe_version      varchar(10),
  source_type          varchar(30)              not null,
  analysis_task_id     varchar(255)             not null,
  analysis_task_status varchar(30)              not null,
  report_task_id       varchar(255)             not null,
  report_task_status   varchar(30)              not null,
  shake_grid_xml       text,
  analysis_task_result text,
  mmi_output_path      varchar(255),
  report_task_result   text,
  constraint realtime_earthquake_shake_id_60b1eea6399d88a7_uniq
  unique (shake_id, source_type)
);

create index realtime_earthquake_location_id
  on realtime_earthquake (location);

create index realtime_earthquake_shake_id_7d3cc59442d39357_like
  on realtime_earthquake (shake_id);

create table if not exists realtime_earthquakemmicontour
(
  id            serial                      not null
    constraint realtime_earthquakemmicontour_pkey
    primary key,
  geometry      geometry(LineStringZ, 4326) not null,
  mmi           double precision            not null,
  properties    text                        not null,
  earthquake_id integer                     not null
    constraint realti_earthquake_id_1edcd8d2ccf76dd4_fk_realtime_earthquake_id
    references realtime_earthquake
      deferrable initially deferred
);

create index if not exists realtime_earthquakemmicontour_b8ad623c
  on realtime_earthquakemmicontour (earthquake_id);

create index if not exists realtime_earthquakemmicontour_geometry_id
  on realtime_earthquakemmicontour (geometry);
ivanbusthomi commented 6 years ago

HI @lucernae this is the gist link. I'm still updating it https://gist.github.com/ivanbusthomi/92b9cb911a5465418595145544624a93

And for,

Basically what we want to do is for you is to populate the sql database with this information in each rows:

by each rows do you mean each columns ?

lucernae commented 6 years ago

each rows too, for example one MMI file might contains multiple rows (different MMI lines) and multiple columns (different values for each rows).

gubuntu commented 6 years ago

@lucernae you need to check the Django models in this and other Django apps - the ones above are applying the incorrect index on geometry columns. It must be a gist index, not the default btree index (which is useless on geometry)

@ivanbusthomi I've commented on your gist page

lucernae commented 6 years ago

@gubuntu The scheme above are generated by Django models. I put it as reference only to know about the real column name that we used. I think the spatial index were stored somewhere else and not on that one.

lucernae commented 6 years ago

I mean, I'm pretty sure it will be spatially indexed. It was a default setting on django gis models and gis field.

gubuntu commented 6 years ago

@lucernae it seems your model is using db_index instead spatial_index https://docs.djangoproject.com/en/2.0/ref/contrib/gis/model-api/#spatial-index. It's generating the wrong sql above. So you probably need to tweak your model.

ivanbusthomi commented 6 years ago

EDIT: This is the result that I received while adding the contour shapefiles to the database These events are not synced to the database, because empty folder or the attribute table in the shapefile is empty. 20110519133826 20110926104401 20150821201238 20150924225842 20160302195429 20160302200008 20161116211705 20170515115444 20170607123529 20170929182307 20171005110116 20171007181102 20171008222700 20171018203427 20171023232427 20171024013557 20171026091821

@gubuntu - how do you prefer receiving the database table? CC @lucernae @Charlotte-Morgan @timlinux

lucernae commented 5 years ago

I'm going to close this. The old countours were migrated to postgis table. Thanks for everyone involved!