Open epaulson opened 3 years ago
Thanks for the detailed use case, it is very interesting!
There has been interest before in letting services return property values in reconciliation candidates: https://github.com/reconciliation-api/specs/pull/48#issuecomment-661711478.
For your last point, there is #30.
Thanks. I think the best thing for us to do is to put together a quick prototype and see what folks think? I'm assuming that OpenRefine won't freak out if there's an unexpected new object in the response but that will be easy to test. Would love to hear from other folks creating reconciliation api implementations to see how they might use this.
In looking at the data extension service, we may still explore that - the UI/Workflow in OpenRefine is really nice for that service (see the gifcast here: https://github.com/OpenRefine/OpenRefine/wiki/Data-Extension-API ) so even though it adds a bunch more state at the server for us to support it I think might be worth considering.
We've been exploring using the Reconciliation API in a smart building/smart grid/IoT setting: we have sensor names (and only sensor names) from legacy industrial equipment and we want to predict what class they match in our ontology. For example, a sensor/controller with a name like
room101-tstat01-htsp
has type
https://brickschema.org/schema/1.1/Brick#Heating_Temperature_Setpoint
in our ontology.
We've had good luck using OpenRefine + the reconciliation API to process data like this - we can send this name off to API and get back a list of possible matches - that example is probably the heating temp setpoint, but maybe it's just a
https://brickschema.org/schema/1.1/Brick#Heat_Sensor
, and OpenRefine gives us a nice UI to give those options back to the user and let them choose.Predicting class is the most important thing we do, and the API is fine for that: we return back a response that's like:
{'id': 'room101-tstat01-htsp', 'name': 'https://brickschema.org/schema/1.1/Brick#Heating_Temperature_Setpoint'}
We can do this with string matching with reasonably good results, and we've also been exploring using language models trained on labeled examples. (Note that we just send back the query as the ID, we don't have a database that we're looking things up, and using 'name' seems to work best in OpenRefine for getting this classification into a column)
But another thing we're interested in doing is extracting other metadata that's in that sensor name. If everything is sensibly named OpenRefine is awesome for this - in our example it's easy split columns and to organize that sensor name into the room it's located in and the thermostat that it's attached to.
But our NLP parser can and already is doing this, and it'd be really awesome if as part of our reconciliation API response we could include a JSON struct that included other data that our service pulled out. The OpenRefine UI should ignore it, but with a GREL expression I could pull it out - something like
obviously everything in 'extra' is service-dependent; what our service puts in will be very different than say OpenCorporates
We can't go fetch properties because we're not matching against a different database - I don't have a database that lists 'room101-tstat01-htsp', I'm just parsing it and predicting from it.
I don't think we can use the data-extension service, because we need to know which of the original query responses the user reconciled the "id" to - we don't have a database of IDs and we'd prefer not to have to create them to store predictions.*
Maybe that makes it too far afield from what the Reconciliation API spec is intended for, and it's cool if this isn't something you're interested in complicating the spec with, but we'd love to talk more about how we could enable this and to see if it's useful beyond what we're doing.
Our very basic API implementation is here: https://github.com/BrickSchema/reconciliation-api
Thanks!
*as an aside, it would be cool if there was a way for OpenRefine to report back which reconciliation candidate a user chose for each query, because it'd be great training feedback if you're doing some sort of ML-based prediction, and that'd change the tradeoffs for creating and storing IDs for queries.