reconciliation-api / specs

Specifications of the reconciliation API
https://reconciliation-api.github.io/specs/draft/
31 stars 9 forks source link

Simplify type syntax in reconciliation queries #109

Closed wetneb closed 1 year ago

wetneb commented 1 year ago

Currently, reconciliation queries can contain "either a single type identifier or a list of type identifiers" according to the specs. That is, both of those queries are valid:

{"query":"foo","type":"Q5"}
{"query":"foo","type":["Q5"]}

This is encoded in the JSON schema with a union construct. As pointed out by @fsteeg, this is confusing, especially because the specs mention "an array of types" in the definition and then only have single types without arrays in the examples.

For services, it is annoying to parse this cleanly.

I would therefore propose to drop the first syntax above and require types to be enclosed in an array, even if there is only a single element.

fsteeg commented 1 year ago

Since this is indeed a breaking change, as described by @tfmorris in https://github.com/OpenRefine/OpenRefine/issues/5615#issuecomment-1424589666 and https://github.com/reconciliation-api/specs/pull/110#issuecomment-1424612780, maybe the right simplification here is to actually remove the array option? We might get rid of type_strict with that too (which would not be a breaking change since it was optional before).

(As to how this came up: I was mostly confused by the inconsistency of spec text and examples, the way we solved it was by technically using a single type, as I mentioned in https://github.com/OpenRefine/OpenRefine/issues/5615#issuecomment-1424496013. The array approach, even with type_strict, does not work for this use case.)

wetneb commented 1 year ago

That would also make sense, for sure.

thadguidry commented 1 year ago

The downside of removing client provided parameters is that clients cannot ask better questions to services in a single query (some might say this makes it more ambiguous to services, but I disagree in some cases) and thus clients might now need to perform multiple queries for each interesting type (less ambiguous since queries are now limited to a single type and no longer an array of them.)

My worry is that we are making things very narrow and limited in query scope and thus clients and users must now narrow their queries and perform more of them. Ex. no longer allowed per spec? type: ["protein","chemical compound"]

Do others see any more downsides to this issue that removed client query capability?

wetneb commented 10 months ago

In OpenRefine we keep getting requests to reconcile against multiple types:

So I am indeed worried that with this change we are making it harder to meet users' needs. I think the semantics required in the large majority of those cases is just the union of types: returning candidates which are in any of the supplied types.

Given that we are thinking about changing the query field of reconciliation queries into a property (https://github.com/reconciliation-api/specs/issues/134), I think it's tempting to think whether we could do the same for types. It would then let us use the additional settings on properties (https://github.com/reconciliation-api/specs/pull/131), essentially offering users the choice between:

I'll open another issue about this.

thadguidry commented 10 months ago

@wetneb Yes exactly my point last year! Allowing services to return type hierarchy so that smart clients can allow users to see type distribution and allow them more informed decisions and even auto decisions with machine learning on the client side. Allowing services that want to, a way to return much richer and deeper metadata about candidates when a user queries very narrowly once they know about higher types and common shared properties. I think there's much room for improvement in initial type suggest feedback to clients, almost a back and forth refinement of some x number sample entities, and then users can perform the real narrow query with their chosen types and properties and exclusions.