zentity-io / zentity

Entity resolution for Elasticsearch.
https://zentity.io
Apache License 2.0
156 stars 28 forks source link

Allow attributes to be represented as nested fields #77

Closed davemoore- closed 3 years ago

davemoore- commented 3 years ago

Currently the "_attributes" section of the resolution response is a flat object, where each key is the name of an attribute. Allowing the attributes to be nested will allow users to save results in an index that follows the guidelines and best practices for the Elastic Common Schema (ECS), which encourages nesting by way of prefixes.

If this feature is released at the same time as #73, then it would create one breaking change instead of two.

Proposal

Allow periods (.) to be used in the attribute names of entity models, and used them to nest fields in the "_attributes" section of the resolution response.

Example

Entity model - Attribute names are flat and may contain periods. This example shows attributes which are grouped by prefixes.

{
  "attributes": {
    "name.first": {},
    "name.middle": {},
    "name.last": {},
    "location.address.street": {},
    "location.address.city": {},
    "location.address.state": {},
    "location.address.zip": {}
  }
}

Resolution request - Attribute names are flat and retain their periods. Nesting would not be allowed at this point. Rationale: Attributes may be arrays of values or objects with values and params (source), and allowing nested attributes here would make it difficult to determine whether the nested object was an attribute value or a nested attribute name.

{
  "attributes": {
    "name.first": [ "Alice" ],
    "name.middle": [ "Q" ],
    "name.last": [ "Jones" ]
  }
}

Resolution response - Attribute names are split and nested by their periods.

{
  "_attributes": {
    "name": {
      "first": [ "Alice" ],
      "middle": [ "Quincy" ],
      "last": [ "Jones" ]
    },
    "location": {
      "address": {
        "street": [ "123 Main St" ],
        "city": [ "Washington" ],
        "state": [ "DC" ],
        "zip": [ "20001" ]
      }
    }
  }
}
shaunkaufmann-telicent commented 2 years ago

Thanks for the great work! Quick point.

what happens when you have an array of nested objects such that: indexed entity:

{ addresses[ {
       "street": "street 1" ,
        "city": "city 1" ,
        "state":  "state 1" ,
        "zip":  "zip 1" 
      },
{
       "street": "street 2" ,
        "city": "city 2" ,
        "state":  "state 2" ,
        "zip":  "zip 2" 
      }
]
}

with say a resolver that uses ["addresses.street", "addresses.city"] for simplicity

{
  "attributes": {
       "addresses.street": "street 2" , <--- note mixed 1 and 2 locations
        "addresses.city": "city 1" ,
  }
}

ideally, this should NOT return a match But in my (likely Naive)tests, it does return a match.

Are there ways of setting up a model to get around this?