inasafe / inasafe-realtime

Realtime logic for InaSAFE
2 stars 8 forks source link

Architecture change for InaSAFE Realtime Hazard Service #164

Closed lucernae closed 4 years ago

lucernae commented 6 years ago

Summary

InaSAFE Realtime Hazard (IRH) service will refer to a set of docker service in this repo that will have role to fetch Hazard data from InaSAFE partners. These hazard data then will be converted into relevant InaSAFE layers with attached keywords and parameters. Then this hazard layers will be sent to InaSAFE Realtime Django (IRD). InaSAFE Django will then consume the raw data into the database for future use or history. It will then forward the Hazard layer to be used by InaSAFE Realtime Processor (IRP) for Analysis.

Regular Hazard Data Flow

This is the normal data flow of hazard data into IRH.

Earthquake

  1. Shake grid pushed by BMKG to IRH via SFTP/rsync. (either initial or data-informed)
  2. New incoming grid file monitored by Shakemap monitoring service will trigger new hazard job. (we have monitoring service for initial and data-informed)
  3. IRH convert the grid.xml file and populate it with metadata/necessary keywords using InaSAFE core into an InaSAFE Hazard Layer.
  4. The resulting Hazard Layer will be saved in an agreed location and IRH will inform IRD that new hazard layer exists (using REST API Controller).
  5. IRD will issue new event analysis to IRP.

Flood

  1. Periodic celery schedule in IRH will trigger new Flood data fetch every hour or 6 hour, depending on settings.
  2. Celery worker in IRH will accept the task and proceed to fetch data from PetaBencana.id API, the result will be saved as GeoJSON file.
  3. The previous task should trigger another celery task in IRH if important hazard class exists. The next celery task will convert GeoJSON file using InaSAFE core into an InaSAFE Hazard Layer
  4. The resulting Hazard Layer will be saved in an agreed location and IRH will inform IRD that new hazard layer exists (using REST API Controller).
  5. IRD will issue new event analysis to IRP.

Volcanic Ash

  1. PVMBG Officer will upload volcanic ash fall raw hazard layer into IRD using Volcanic Ash Upload Form.
  2. IRD will submit task to IRH to process new volcanic ash raw hazard data.
  3. Celery worker in IRH will accept task and proceed to convert the raw hazard data using InaSAFE core into an InaSAFE Hazard Layer.
  4. The resulting Hazard Layer will be saved in an agreed location and IRH will inform IRD that new hazard layer exists (using REST API Controller).
  5. IRD will issue new event analysis to IRP.

Recalculate analysis from IRD

In some cases, we want to regenerate the analysis from IRD. Because all the raw hazard data were saved in IRD, it was possible to send the raw hazard data to IRH and trigger an alternate flow to process the event.

Earthquake

The only difference is IRD saved the file into corresponding folder that is being monitored by IRH.

  1. Shake grid saved into filesystem by IRD via SFTP/rsync/direct file access into a corresponding monitored folder by IRH (either initial or data-informed)
  2. New incoming grid file monitored by Shakemap monitoring service will trigger new hazard job. (we have monitoring service for initial and data-informed)
  3. IRH convert the grid.xml file and populate it with metadata/necessary keywords using InaSAFE core into an InaSAFE Hazard Layer.
  4. The resulting Hazard Layer will be saved in an agreed location and IRH will inform IRD that new hazard layer exists (using REST API Controller).
  5. IRD will issue new event analysis to IRP.

Flood

The only difference is IRD will feed the data directly to IRH.

  1. IRD trigger a task in IRH to fetch flood hazard data from IRD.
  2. Celery worker in IRH will accept the task and proceed to fetch data from IRD API, the result will be saved as GeoJSON file.
  3. The previous task should trigger another celery task in IRH if important hazard class exists. The next celery task will convert GeoJSON file using InaSAFE core into an InaSAFE Hazard Layer
  4. The resulting Hazard Layer will be saved in an agreed location and IRH will inform IRD that new hazard layer exists (using REST API Controller).
  5. IRD will issue new event analysis to IRP.

Volcanic Ash

The only difference is there is no upload form step. Because data were already in IRD.

  1. IRD will submit task to IRH to process new volcanic ash raw hazard data.
  2. Celery worker in IRH will accept task and proceed to convert the raw hazard data using InaSAFE core into an InaSAFE Hazard Layer.
  3. The resulting Hazard Layer will be saved in an agreed location and IRH will inform IRD that new hazard layer exists (using REST API Controller).
  4. IRD will issue new event analysis to IRP.

Architectural Changes and Design Consideration

The following were some changes and the reason why it is good to implement that

  1. Separating realtime and headless package

Headless package will perform generic InaSAFE Analysis with additional parameters such as report template and minimum needs profile. Meanwhile realtime package will only perform data fetch and conversion of hazard data into InaSAFE Layer, then pass the data along.

  1. IRH will be a stateless service

IRH will not store any state. But it can store files without managing it. This means the size of the service can be kept minimum, and IRD will be the one who is responsible of saving the raw hazard file to the database and cleaning out temporary result in IRH to save space. For example IRH can receive grid.xml, but will not remember it exists, so IRD can save the raw data and delete it from disks to save space. The same reason applies for converted InaSAFE Layers.

  1. All data conversion logic should be handled by InaSAFE Core

This makes all the logic reproducible in InaSAFE desktop. IRH will only handled fetching the data and passing along converted data.

CC @timlinux @gubuntu @ismailsunni in case something is missing.

lucernae commented 4 years ago

Fully implemented