timlinux commented 11 years ago

Problem

The information returned by impact functions is for the most part interpolated into strings making it difficult for upstream code to make use of the outputs for further calculations.

See for example #579

Proposed solution

For post 1.2 release we would like to put in place a more formalised data structure to represent impact function return results. This could probably be in the form of a JSON dictionary of terms and values, but we need to get some proposals in place as to how this could look. In particular it should be:

simple
represent the range of data that may be returned by all current impact functions be extensible for future impact functions
remove all mark up so that it is purely data that is passed around.
Expected outcome

A design document for discussion showing how impact functions can return data not markup, and allow higher level code (e.g. dock, web ui) to mark up the data for presentation to users.

CC

@vanpuk @ingenieroariel @cchristelis

ingenieroariel commented 11 years ago

+10

Thanks for this proposal Tim.

cchristelis commented 11 years ago

I had a look at the various keywords currently used in the impact functions. This is a design for sharing the keywords data between safe and inasafe: keywords_diagram The creator and interpreter classes should make use of getter and setters, which are implied in the diagram.

The tables are broken down into even smaller tables which contain a section of the impact function's results. This will allow for easy addition of data. To manage the table snippets table classes can be used in both safe and INAsafe. Such a system is: table_class_diagram

Having small table sections makes the table data compact. Formatting the table sections and finding information in the snippet also is very manageable. Tables gotten from old runs can be converted to this new style internal structure if the effort is justified. Otherwise a dummy class can present the same interface as the table classes and return the already formatted data when needed.

Please let me know if this solution can be improved or if I missed anything. (Please excuse the text formatting, there seems to be a problem converting the text in the dia file to an image)

timlinux commented 11 years ago

Hi @cchristelis

Thanks for taking a first pass architecture design for this. My comments follow below.

Representing the impact function data in JSON is great, +1 for that.
I think we need to go further / make a more radical shift in the underlying data structure for impact functions. In particular we need the semantics in place to get out the natural products of an impact function e.g. affected people, displaced people, total people in analysis area, affected buildings, total buildings in analysis area (and soon affected roads, affected landuse types) etc, As such I think the JSON should not be representing Table objects but rather rich semantic data structures. As per the contrived example below.
As well as serialiser / deserialisers, we should implement formatters. This would be the appropriate place to then represent the data as one or more table objects (or Message objects which can include tables).

{
    VERSION: JSON_KEYWORD_VERSION,
    document_type: 'impact_assessment',
    function_details: {
        impact_function_id: ,   <-- note we should use the id not the name as unique identifier
        impact_function_id:, 
        author :  ,
        synopsis: ,
        rating : ,
        parameters: ,
        description: ,
        citation: ,
        limitation:,},
    impact_details, {
         exposure_subcategory: 'population',
         hazard_subcategory: 'flood',
         hazard_units: 'wet/dry',
         total_population: 23432,
         affected_population: 20000,
         evacuated_population: 10000},
    minimum_needs: {
        food: {
            type: 'Rice',
            quantity:  0.4,
            units: 'gram',
            plural: 'grams',
            unit_abbreviation, 'g',
            per_time_period: 'day',       ///week / month / year
            per_population_unit: 'person'}, // houshold
       drinking_water: {
           type: 'Drinking water',
           quantity:  3,
           units: 'litre',
           plural: 'litres',
            unit_abbreviation, 'l',
            per_time_period: 'day',       ///week / month / year
            per_population_unit: 'person'}, // houshold
        clean_water: {
            type: 'Clean water',
            quantity:  30,
            units: 'litre',
            plural: 'litres',
            unit_abbreviation, 'l',
            per_time_period: 'day',       ///week / month / year
            per_population_unit: 'person'}, // houshold
        clean_water: {
            type: 'Hygiene packs',
            quantity:  1,
            units: 'pack',
            plural: 'packs',
            unit_abbreviation, 'pack',
            per_time_period: 'week',       ///week / month / year
            per_population_unit: 'family'},
    },
    aggregation: {
        1: { area_name: 'foo',    //1 == feature id of agg area
               minimum_needs: { //as above but for admin area}},
        2: { area_name: 'foo', 
               minimum_needs: { //as above but for admin area}},
        3: { area_name: 'foo', 
               minimum_needs: { //as above but for admin area}}
    },
    post_processing: {
    },
    provenance: {
        impact_layer: {
             path: 'foo.shp',
             attribution: 'Tims collection of GID data'}
        exposure_layer: {
             path: 'bar.shp',
             attribution: 'Tims collection of GID data'}
        aggregation_layer: {
             path: 'abc.shp',
             attribution: 'Tims collection of GID data'}
    }
    metrics: {
        analysis_date: '21-07-2013 08:43.23',
        analysis_duration: 2133} // seconds
    }
}

ideally we should provide a class (or collection of classes) in python which provide similar semantics to the JSON structure.

Before we invest too much time in the details, it would be really good to get feedback from @inginieroariel to see if the broad strokes of what I am suggesting above work for him too.

PS. It would be great if you could do your diagramming in a share lucid charts doc (https://www.lucidchart.com) so that we can collaboratively edit and tweak them.

ingenieroariel commented 11 years ago

Some points from our technical discussion:

a) We should look into having a json format that works both for input (keywords) and output (what's currently described). timlinux and cc will look into this. b) We should see if there is a simple json metadata format that can wrap this and increase our chances of other software reading this. Candidates are ISO 19115 in JSON format (if it exists) and Open Data JSON metadata.[1]

[1] http://project-open-data.github.io/metadata-resources/

ismailsunni commented 10 years ago

From our last IRC discussion, here is my notes:

We have discussed about what we need to do:

[From Chris] Update all IF so that it use the json_keywords, started with the earthquake (but first, I need to make sure that it's error free, I found an importerror this morning) (Tim said we'd better do it on one IF first perfectly, then we can adapt it to all IFs by visiting them once)
[From Tim] Extend set_function_details to include constraints (what types of inputs are required)
[From Tim] have a method separate from run() on each IF that returns us the function details json
[From Tim] Filter for the wizard
[From Tim] Each IF needs to be visited and updated to properly register json keywords on run, metadata query etc (I think we can do it when QGIS load InaSAFE)
[From Chris] Preprocess to convert .keywords to .json
(anything I missed?)

I think I understand what we to do with this json (as a replacement of the keywords and put all information to this json). But, I still don't really catch the table_formatter (we didn't discuss much the conversation in this part, since there is time limitation). In my current understanding, it will be used to generate a nice table for InaSAFE's analysis result, wont it? And Chris intends to split it by several classes according to the exposure, right?

May be Chris can tell me more about this?

cc @timlinux @cchristelis

ismailsunni commented 10 years ago

Some comments:

I think we need to put notes and action check list to this json. The reason notes and action check list are also part of the impact function result, and they are specific to an impact function (although some
I think we need to change how we implement the impact details in the json. The issue to be considered is impact function may be have more than one breakdown (building by type and building by hazard level)

timlinux commented 10 years ago

Hi @ismailsunni

We have on the work plan to pervasively support JSON through InaSAFE so that would encompass:

Impact function metadata (covered by the work we have done with Borys in wizards)
Impact function parameters
Aggregation options
Postprocessor inputs and outputs
Minimum needs parameters and reporting
Actions
Impact function results
Analysis execution logs and metadata

Basically we need to go through a similar exercise that we have done with Borys to design what the metadata document should look like for each of these and then get the json generator to produce this. Please bear that in mind while reviewing the json code so that ultimately we could have good cohesions between the metadata (stored in python dicts) and the json doc (perhaps rendered from those dicts).

timlinux commented 8 years ago

Closing this as we have dealt with this.

inasafe / inasafe

Impact functions should provide raw, parsable data #587

Problem

Proposed solution

Expected outcome

CC