reconciliation-api / specs

Specifications of the reconciliation API
https://reconciliation-api.github.io/specs/draft/
31 stars 9 forks source link

Clarity on difference between versions #112

Closed rogerhyam closed 1 year ago

rogerhyam commented 1 year ago

I'm trying to implement a Reconciliation Service and am running into a series of problems mainly related to the differences between 0.1 and 0.2.

There is a major difference in the structure of the query batch between the two versions

In 0.1 "A reconciliation query batch is a set of reconciliation queries indexed by string identifiers." (an object)

https://reconciliation-api.github.io/specs/0.1/#dfn-reconciliation-query-batch

In 02. "A reconciliation query batch is an array of reconciliation queries"

https://reconciliation-api.github.io/specs/latest/#dfn-reconciliation-query-batch

This is a major breaking change but it isn't listed as such

https://reconciliation-api.github.io/specs/latest/#x0-2

"Initial improvements to the specifications made by our Community Group. Most of them are backwards-compatible, except for the requirement to support CORS for cross-origin access."

Edit: The results are also in entirely different formats!

There is no "Candidates" structure in 0.1 but there is in 0.2

https://wikidata.reconci.link/en/api?queries=%7B%22q0%22%3A%7B%22query%22%3A%22banana%22%7D%7D

This is a documentation issue if nothing else.

From a protocol perspective there is an issue of how a service detects what protocol version it is being asked for. Should it simply test if it is being passed an object or an array? If so that should be documented or different services will do it in different ways. Would it be better to declare in the query what version of the protocol is being used?

Hardly any of the services in the test bench seem to work and fail with similar issues. It looks like the test bench only works with 0.1? Is that correct?

wetneb commented 1 year ago

Very sorry that you are having this problem!

We currently have three versions of the specs online:

Perhaps we should not use the word latest in the URL, as it could be interpreted as the "latest released version". Maybe something else, like draft, could indicate that better? Perhaps it would also be worth adding a very visible disclaimer on the current draft that it is being worked on and should not be used by implementers yet?

rogerhyam commented 1 year ago

Thanks.

Yes a disclaimer and a recommendation as to which document implementers should be working with would be very helpful!

My route was to click on the link in the OpenRefine docs which took me to 0.1. I then just click on the latest version. There is no "replaced by" on the 0.1 spec page or "replaces" on the 0.2. Both these would be helpful too.

At the moment I just want to develop a service to make data available in OpenRefine by the end of the month. I presume that working with the 0.2 spec is what I should be doing?

Is this correct?

When 0.3 is released I might need to upgrade my service in which case it might be worth raising a bug against the development version of the spec. The 0.3 spec should stipulate how the server detects the protocol version. Perhaps another field in the Reconciliation Query Object "protocol_version"?

thadguidry commented 1 year ago

👍 for draft

wetneb commented 1 year ago

Yes for OpenRefine you can use either 0.1 or 0.2, and for new services I would definitely recommend 0.2.

See also #78 for version detection issues.

tfmorris commented 1 year ago

I would recommend using 0.1 since that's what was written to document the OpenRefine API. When 1.0 is approved and published, you can plan your migration to that. I would only recommend implementing 0.2, 0.3, etc if you are interested in testing and providing feedback on the evolving spec as it's developed.

Having said all that, there are a large number of reconciliation services written in a variety of languages that you could use as a starting point, so you shouldn't really be needing to start from scratch. I did light modernization pass on one of the Python ones last year here: https://github.com/cmharlow/lc-reconcile/pull/107

wetneb commented 1 year ago

I would definitely not recommend implementing 0.1 because it relies on JSONP, which is a security vulnerability that we should patch on OpenRefine's side. We should not execute remote Javascript without first warning the user about this. Version 0.2 does not have this issue.

I do not think scripts like lc-reconcile are really advisable as basis to implement a service people generally need to implement something within an existing web application, so within a particular web framework. lc-reconcile is also not very actively maintained, so it does not feel like a very advisable example.

tfmorris commented 1 year ago

The stated timeframe is:

I just want to develop a service to make data available in OpenRefine by the end of the month.

so it needs to work with the version(s) of OpenRefine which are currently deployed in the field.

OpenRefine used to have a list of example reconciliation services which could be used as starting points, but I can no longer find it. @wetneb What would you recommend as a good set of starting points?

wetneb commented 1 year ago

it needs to work with the version(s) of OpenRefine which are currently deployed in the field.

Version 0.2 has been supported since OpenRefine 3.3, released three years ago.

What would you recommend as a good set of starting points?

I would recommend the services and libraries listed in the census.

rogerhyam commented 1 year ago

Thanks for your thoughts. If you are interested my test implementation is here

https://list-dev.rbge.info/reconcile

It will disappear from this domain when we make the whole API live in March (planned).

This is quite a simple API if you have an existing code base that does the actual matching. Here is the class that does the main body of it.

https://github.com/rogerhyam/wfo-plant-list/blob/main/include/ReconciliationService.php

Still needs cleaning up but it works as a first pass. As I type this I notice I've linked to the wrong spec in the comments still ...

wetneb commented 1 year ago

This should now be resolved, since the stable versions are published on w3.org and are marked as "Final Community Group Report":

In comparison, the current draft has a URL that ends with draft/ and is marked as a "Draft Community Group Report": https://reconciliation-api.github.io/specs/draft/