iai-group / pkg-api

PKG API
3 stars 0 forks source link

Pipline for NL to KG #78

Closed IKostric closed 9 months ago

IKostric commented 10 months ago

Based on my understanding, the new pipeline should look like this (this example is for INSERT action/intent): 1) NL received by the backend 2) NL stored to KG as statement 3) Extract triple as string (subject, predicate, object, preference) 4) Link any entities found 5) Extract additional anno 5) Update the KG entry with this additional information.

I suggest a slight simplification. We start by creating three dataclasses:

@dataclass
class TripleAnnotation:
  subject: str
  predicate: Optional[str] = None
  object: Optional[str] = None

@dataclass
class PreferenceAnnotation:
  topic: Uri
  weight: float

@dataclass
class KnowledgeData:
  statement: string
  triple: Optional[TripleAnnotation] = None
  preference: Optional[PreferenceAnnotation] = None
  # ... metadata

This way we can streamline the process of inserting, and only insert at the end when the dataclasses are populated. This also simplifies moving information between modules as we only need to pass KnowledgeData instead of all arguments separately.

One thing I am unsure about is how to extend this to add linked entities. Maybe another dataclass UriTripleAnnotation or similar could be used.