marcderbauer / humanitarianKG

0 stars 0 forks source link

Ensure all Entities in Relations are Named & Created #28

Open marcderbauer opened 7 months ago

marcderbauer commented 7 months ago

Problem

When running the current Cypher queries, only some relations are related. It seems that only the relations where both the start and end nodes exist work correctly. This means we need to create all nodes before creating the relations.

Example

Example Sentence

Conflict and insecurity in South Sudan are once again at levels which would typically be associated with civil war, and would indicate that a substantial collapse in state authority is underway. Yet the government in Juba endures, and the political bargain holding together the various military, security, and rebel factions since 2018 has mostly held during this surge in violence. What explains this soaring violence, and what does this tell us about the strategies the government and ruling elites use to survive in such conditions?

Example output

{
    "nodes": [
        [
            "South Sudan",
            "Country",
            {}
        ],
        [
            "Juba",
            "City",
            {}
        ],
        [
            "Government",
            "Organization",
            {}
        ]
    ],
    "relationships": [
        [
            "South Sudan",
            "has_conflict",
            "Civil War",
            {
                "levels": "high"
            }
        ],
        [
            "Government",
            "located_in",
            "Juba"
        ],
        [
            "Government",
            "endures_despite",
            "Civil War"
        ],
        [
            "Government",
            "holds_together",
            "Military",
            {
                "since": 2018
            }
        ],
        [
            "Government",
            "holds_together",
            "Security",
            {
                "since": 2018
            }
        ],
        [
            "Government",
            "holds_together",
            "Rebel Factions",
            {
                "since": 2018
            }
        ],
        [
            "Violence",
            "surges_in",
            "South Sudan"
        ],
        [
            "Government",
            "uses_strategies_to_survive",
            "Violence"
        ]
    ]
},

There are some entities listed under relations that are not listed under entities (e.g. Civil War).

Solution

With our current taxonomy, nodes have two parts:

  1. The name/label
  2. The category

The principled approach would be to take all the entities recognised under relations and to assign them a proper label.
This seems a bit premature, for now as it would require either:

The pragmatic approach would be to only require nodes to have a unique identifier (the "name") and to not require a category. Since we lack a consistent taxonomy anyway, this seems reasonable.
We can then prioritise either of the principled approaches.

Further thoughts:

marcderbauer commented 7 months ago

I tried turning this into a two step process:

  1. extracting nodes
  2. extracting relations given a set of entities

Example Sentences:

Conflict and insecurity in South Sudan are once again at levels which would typically be associated with civil war, and would indicate that a substantial collapse in state authority is underway. Yet the government in Juba endures, and the political bargain holding together the various military, security, and rebel factions since 2018 has mostly held during this surge in violence. What explains this soaring violence, and what does this tell us about the strategies the government and ruling elites use to survive in such conditions?

Nodes:

"nodes": [
            [
                "South Sudan",
                "Country",
                {
                    "name": "South Sudan"
                }
            ],
            [
                "Juba",
                "City",
                {
                    "name": "Juba"
                }
            ],
            [
                "Government of South Sudan",
                "Organization",
                {
                    "name": "Government of South Sudan"
                }
            ]
        ]

Relations:

"relationships": [
            [
                "South Sudan",
                "has_conflict",
                "Civil War",
                {
                    "level": "high"
                }
            ],
            [
                "Government of South Sudan",
                "located_in",
                "Juba",
                null
            ],
            [
                "Government of South Sudan",
                "maintains",
                "Political Bargain",
                {
                    "since": 2018
                }
            ]
        ]

The problem of having entities in the relations that don't exist still persists. Will need to try (one) of the following

marcderbauer commented 6 months ago

Will save this experiment in a feature branch. Fix for now in #30