Open saumier opened 1 year ago
So on the protocol level, would this mean to allow arrays of objects for candidate name
and description
?
"candidates": [
{
"id": "K11-15",
"name": [
{
"str": "National Arts Centre - Azrieli Studio",
"lang": "en"
},
{
"str": "Centre National des Arts - Studio Azrieli",
"lang": "fr"
}
],
...
}
]
If we go down that route, I wonder if we should also add support for that for multiple names for properties (when returned in a property suggest response, or in a data extension response) or for types (when returned in a type suggest response, or in a reconciliation response as part of the reconciliation candidates). I guess it would make things look more uniform but I am not really sure about the use case. What do you think @saumier?
If we go down that route, I wonder if we should also add support for that for multiple names for properties (when returned in a property suggest response, or in a data extension response) or for types (when returned in a type suggest response, or in a reconciliation response as part of the reconciliation candidates). I guess it would make things look more uniform but I am not really sure about the use case. What do you think @saumier?
Yes. Since the group is not recommending JSON-LD, then I think this is the next best approach.
I am implementing a bilingual website (en, fr) that implements a client for the reconciliation API here kg.artsdata.ca. The UI of this site can switch between English and French. When querying using the reconciliation API, a query string can be in any language. For example I could query a Place using "Studio Azrieli" and "Azrieli Studio". The response would return candidates including K11-15. With this new approach, the website could display the name and description in the UI language.
Also good for add support for property and type suggestions.
Summary of our discussion on the monthly call of last month: we could either
Maybe there are other options?
We thought that it is worth bringing more attention to this issue from the broader community, to gather more feedback.
Unless the variable structure is backward compatible when the simple variant is used, I think it's better to be consistent and always use the array form, even for a single entry. I suspect that things have diverged enough that there's not a compatibility benefit.
I second @tfmorris opinion. I like the consistency of when our API standards have a context that could be "one or many" then we resort to Array form. (mostly because the idea of simpler JSON structure, is precluded that perhaps JSON Array objects are complicated or noisy?, when they really are not for developers and our 2024+ tooling nowadays)
Generally, this seems to be related to #52 as a solution to this issue will also resolve the #52, won't it?
Maybe there are other options?
I am late to the party (sorry) but am adding this for reference. Generally, I like the "language map" approach from JSON-LD (examples) for providing labels in multiple languages as it is simple, terse and easy to read. The example from https://github.com/reconciliation-api/specs/issues/138#issuecomment-1803585218 would look like this with language maps:
{
"candidates":[
{
"id":"K11-15",
"name":{
"en":"National Arts Centre - Azrieli Studio",
"fr":"Centre National des Arts - Studio Azrieli"
}
}
]
}
@acka47 If we went that route, we'd have to adopt a convention and document it. That being the key should be an ISO 639-3 three letter code? Hmm, what else?
@acka47 I like the conciseness but how would a service represent a name or description for which it does not know the language? (Use case: a tool like CSV-reconcile, which spins a reconciliation service on arbitrary datasets, generally will not have access to this sort of information and shouldn't make up a language for the sake of fitting in)
If we went that route, we'd have to adopt a convention and document it. That being the key should be an ISO 639-3 three letter code?
Yes, we could define it similar to JSON-LD like this: "keys must be strings representing [BCP47] language codes and the values must be a string."
how would a service represent a name or description for which it does not know the language?
Good question. I guess for the other approach from https://github.com/reconciliation-api/specs/issues/138#issuecomment-1803585218 you would you just omit the optional lang
key. With the language map approach you would have to use und
as key (for "undetermined"), I guess.
Would the array approach allow for multiple alias names in the same language whereas the map approach would not? That could be an argument for choosing the array approach. On the other hand, I am not sure we actually want to allow this?
Another aspect to consider for the lang
field vs. language maps is that the field provides a general approach for all objects. To quote from the current draft:
All objects used in this protocol (entities, types, properties, queries, candidates, features, etc.) MAY declare an explicit text-processing language in a
lang
field.
[...] I think it's better to be consistent and always use the array form [...]
To be clear, this is not only about array vs. non-array, but also object vs. string.
The common, simple case currently:
"name": "National Arts Centre - Azrieli Studio"
The common case in the unified syntax:
"name": [
{
"str": "National Arts Centre - Azrieli Studio"
}
]
If this was the first and only place where we introduce optional structure (string or array of objects), I'd agree we might want to avoid that. But since we do the same thing in other places (e.g. property values), I feel like the much simpler common case is worth having the option.
how would a service represent a name or description for which it does not know the language?
From JSON-LD https://www.w3.org/TR/json-ld/#example-102-indexing-languaged-tagged-strings-using-none-for-no-language
... the special index @none is used for indexing strings which do not have a language; this is useful to maintain a normalized representation for string values not having a datatype.
Example if there was no language for a name.
{
"candidates":[
{
"id":"K11-15",
"name":{
"@none":"National Arts Centre - Azrieli Studio"
}
}
]
}
I'm not really enthusiastic about any of the solutions, but the one that I find the least bad is @fsteeg's suggestion to use the existing language (+ text direction) mechanisms we have, and simply switch to this default syntax:
"name": [
{
"str": "National Arts Centre - Azrieli Studio"
}
]
with the option to add a lang
and dir
attributes at the same level as the str
if needed, and to add more objects in the array.
This also has the benefit of allowing for returning multiple names in a same language (for alternate names, such as acronyms for instance).
And I agree with @tfmorris on the preference to stick to the array form.
I also agree with @wetneb and @tfmorris to use an array of objects with the str
attribute and optional lang
and dir
.
For the sake of comparison with other patterns, this somewhat resembles the keys @value
, @language
and @direction
used in JSON-LD.
I have no preference here but just felt that the language map approach should at least be discussed in this context. Thus, I am fine with an array of objects containing at least the str
with optional lang
and dir
.
@wetneb My team has implemented an endpoint for the current draft spec and updated our branch of the test bench to support both v0.2 and v0.3 (draft).
Here are 2 screen grabs from our branch of test bench. One showing our production reconciliation endpoint v0.2 and a second screen grab showing our test reconciliation endpoint v0.3 with multi-lingual support meeting the needs of this use case. This is a work in progress.
As a service provider, I would like clients to be able to query in any language and to return candidate names in one or more languages specified by the client request.
Use Case
A client is reconciling a place in Canada using the Artsdata.ca Reconciliation service with the name "Studio Azrieli".
Current solution (not ideal)
The service returns multiple entities including K11-15 "National Arts Centre - Azrieli Studio" and K11-15 "Centre National des Arts - Studio Azrieli" which appear as separate entities but have the same URI. This may appear incorrect to the user because there are 2 candidates. If the user doesn't notice that they have the same URI then they may be mistaken as duplicates.
Ideal solution
The service returns multiple entities but only a single K11-15 displaying both names "National Arts Centre - Azrieli Studio" and "Centre National des Arts - Studio Azrieli" together. Parameters can specify the languages the client would like to display.