reconciliation-api / specs

Specifications of the reconciliation API
https://reconciliation-api.github.io/specs/draft/
30 stars 9 forks source link

Add new `output` field to `features` #130

Closed thadguidry closed 2 months ago

thadguidry commented 1 year ago

We added #31 to give some control over candidate scoring to clients, but its not full control or enough as sort of asked for in a few issues and #61 Clients might simply want to provide some constraints or limitations in a query, which is being discussed in #23 and #88 and elsewhere. But in #61 and in #128 there opens a need to perhaps also allow asking a service to output or return much more information about each candidate entity that might match some constraints or limitations.

In Freebase days there was a way to request additional information or output from the service.

Example: Match "san francisco" and return all data in the location domain about it that is accessible via the output parameter.

filter=(all name{full}:"San Francisco" type:/location/citytown)
&output=(all:/location)
&limit=1

For reconciliation candidates, this might output additional information in JSON with perhaps semantic tuples (subject, predicate, object or shortened spo) for standardization for clients to consume with id, name, value fields as String type. Likely we can skip subject since it's the reconciliation candidate entity, so just predicate = name, object = value in the output with id as well.

s:San Franciso p:partOf o: San Francisco Bay Area s:San Francisco p:inception o:29 June 1776

{
 "id": "12345",
 "name": "San Francisco",
 "score": 55,
 "features": [
   {
     "id": "name_generic",
     "name": "baseline score for the label",
     "value": 133
     "context": "non-LSI, 1 matched broader type: generic" 
   },
 "output": [
   {
     "id:" "P361"
     "name": "part of"
     "value": {
       "[San Francisco Bay Area](https://www.wikidata.org/wiki/Q213205)",
       "[San Francisco–San Mateo–Redwood City metropolitan division](https://www.wikidata.org/wiki/Q63567254)"
      }
   },
   {
     "id:" "P138"
     "name": "named after"
     "value": "[Francis of Assisi](https://www.wikidata.org/wiki/Q676555)"
   }
]]}
fsteeg commented 1 year ago

If I understand this correctly it seems to be very much like our data extension functionality.

I think that clients who want to get more information about candidates (for custom scoring etc.) could do so by using a data extension request based on the candidate IDs like this (following your example):

{
  "ids": [
    "12345"
  ],
  "properties": [
    {"id": "P361"},
    {"id": "P138"}
  ]
}

And get back something like:

{
  "meta": [
    {
      "id": "P361",
      "name": "part of"
    },{
      "id": "P138",
      "name": "named after"
    }
  ],
  "rows": [
    {
      "id": "12345",
      "properties": [
        {
          "id": "P361",
          "values": [
            {"str": "[San Francisco Bay Area](https://www.wikidata.org/wiki/Q213205)"},
            {"str": "[San Francisco–San Mateo–Redwood City metropolitan division](https://www.wikidata.org/wiki/Q63567254)"}
          ]
        },{
          "id": "P138",
          "values": [
            {"str": "[Francis of Assisi](https://www.wikidata.org/wiki/Q676555)"}
          ]
        }
      ]
    }
  ]
}

So it seems to me that we don't need additional functionality in the protocol to do this.

thadguidry commented 1 year ago

Hmm but the domain is what I was also highlighting as a need. If I use the data extension, how would one extend for a set of properties that define a domain or in a namespace to return to a client? In my example I only want location properties.

fsteeg commented 12 months ago

Right, I missed that part in your example.

Perhaps this could be done using property proposals, so something like this:

GET /extend/propose?type=location

Would return this:

{
  "type": "location",
  "properties": [
    {
      "id": "P361",
      "name": "part of"
    }, {
      "id": "P138",
      "name": "named after"
    }
  ]
}

Which could then be used by clients to create the data extension query from above.

thadguidry commented 2 months ago

@fsteeg Nice, yeah, I can see /extend/propose would likely solve this need. OK, closing this then. I think we have a good solution/workaround from your comments.