juupje / pyMogwai

PyMogwai is a Python-based implementation of the Gremlin graph traversal language, designed to create and handle knowledge graphs entirely in Python without the need for an external Gremlin server.
Apache License 2.0
2 stars 0 forks source link

allow indexing of node and edges #13

Closed WolfgangFahl closed 1 week ago

WolfgangFahl commented 2 weeks ago

apply A. Harth and S. Decker, "Optimized index structures for querying RDF from the Web," Third Latin American Web Congress (LA-WEB'2005), Buenos Aires, Argentina, 2005, pp. 10 pp.-, doi: 10.1109/LAWEB.2005.25. keywords: {Resource description framework;Data models;Semantic Web;Indexes;Java;Vocabulary;Database systems;Memory;Indexing;Information retrieval},

e.g. in the style of https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-storage-indexing.html

WolfgangFahl commented 1 week ago
if self == IndexConfigs.MINIMAL:
    return IndexConfig({
        # Core indices for basic node relationships
        "PS",  # Predicate -> Subject: links predicates to subjects (e.g., labels or properties to nodes)
        "PO",  # Predicate -> Object: maps predicates to values (e.g., property values)
        "SO",  # Subject -> Object: links source nodes to target nodes in relationships
        "OS",  # Object -> Subject: reverse lookup for values back to nodes

        # Graph-based indices for context-specific associations
        "PG",  # Predicate -> Graph: associates predicates with graph contexts
        "SG",  # Subject -> Graph: associates subjects with graph contexts
        "GO",  # Graph -> Object: maps graph contexts to objects for grouped retrieval
        "GP"   # Graph -> Predicate: links graph contexts to predicates
    })
Graph initialized
Index: SG
{
  "0": "{'node-name', 'edge-property', 'edge-name', 'edge-link', 'node-property', 'edge-label', 'node-label'}",
  "1": "{'node-name', 'node-property', 'node-label'}",
  "2": "{'node-name', 'node-property', 'node-label'}",
  "3": "{'node-name', 'edge-property', 'edge-name', 'edge-link', 'node-property', 'edge-label', 'node-label'}",
  "4": "{'node-name', 'node-property', 'node-label'}",
  "5": "{'node-name', 'edge-property', 'edge-name', 'edge-link', 'node-property', 'edge-label', 'node-label'}"
}
Index: OS
{
  "Person": "{0, 1, 3, 5}",
  "marko": "{0}",
  "29": "{0}",
  "vadas": "{1}",
  "27": "{1}",
  "Software": "{2, 4}",
  "lop": "{2}",
  "java": "{2, 4}",
  "josh": "{3}",
  "32": "{3}",
  "ripple": "{4}",
  "peter": "{5}",
  "35": "{5}",
  "1": "{0, 3}",
  "knows": "{0}",
  "0.5": "{0}",
  "3": "{0}",
  "2": "{0, 3, 5}",
  "created": "{0, 3, 5}",
  "0.4": "{0, 3}",
  "4": "{3}",
  "0.2": "{5}"
}
Index: SO
{
  "0": "{0.5, 'Person', 1, 3, 2, 0.4, 'knows', 'marko', 29, 'created'}",
  "1": "{'Person', 27, 'vadas'}",
  "2": "{'Software', 'lop', 'java'}",
  "3": "{32, 'Person', 'josh', 1.0, 4, 2, 0.4, 'created'}",
  "4": "{'Software', 'java', 'ripple'}",
  "5": "{0.2, 'Person', 2, 35, 'peter', 'created'}"
}
Index: PS
{
  "label": "{0, 1, 2, 3, 4, 5}",
  "name": "{0, 1, 2, 3, 4, 5}",
  "age": "{0, 1, 3, 5}",
  "lang": "{2, 4}",
  "knows": "{0}",
  "weight": "{0, 3, 5}",
  "created": "{0, 3, 5}"
}
Index: PG
{
  "label": "{'edge-label', 'node-label'}",
  "name": "{'node-name', 'edge-name'}",
  "age": "{'node-property'}",
  "lang": "{'node-property'}",
  "knows": "{'edge-link'}",
  "weight": "{'edge-property'}",
  "created": "{'edge-link'}"
}
Index: GO
{
  "node-label": "{'Software', 'Person'}",
  "node-name": "{'ripple', 'josh', 'vadas', 'lop', 'peter', 'marko'}",
  "node-property": "{32, 35, 'java', 27, 29}",
  "edge-link": "{1, 2, 3, 4}",
  "edge-label": "{'knows', 'created'}",
  "edge-name": "{'knows', 'created'}",
  "edge-property": "{0.5, 1.0, 0.2, 0.4}"
}
Index: GP
{
  "node-label": "{'label'}",
  "node-name": "{'name'}",
  "node-property": "{'age', 'lang'}",
  "edge-link": "{'knows', 'created'}",
  "edge-label": "{'label'}",
  "edge-name": "{'name'}",
  "edge-property": "{'weight'}"
}
Index: PO
{
  "label": "{'Software', 'Person', 'knows', 'created'}",
  "name": "{'ripple', 'josh', 'vadas', 'lop', 'knows', 'peter', 'marko', 'created'}",
  "age": "{32, 35, 27, 29}",
  "lang": "{'java'}",
  "knows": "{1, 3}",
  "weight": "{0.5, 1.0, 0.2, 0.4}",
  "created": "{2, 4}"
}
WolfgangFahl commented 1 week ago

Assessment of Index Usefulness (by ChatGPT-o)

SG (Subject → Graph)

OS (Object → Subject)

SO (Subject → Object)

PS (Predicate → Subject)

PG (Predicate → Graph)

GO (Graph → Object)

GP (Graph → Predicate)

PO (Predicate → Object)

Summary of Usefulness

The high-usefulness indices (SG, OS, SO, PO) are the most effective for direct and reverse lookups across labels, properties, and relationships. The moderate-usefulness indices could be optimized further or even removed if specific filtering by context type isn’t required.