w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
298 stars 106 forks source link

Internationalization Review for VCDM 2.0 #1155

Closed awoie closed 1 year ago

awoie commented 1 year ago

The following is an Internationalization Review for "W3C Verifiable Credentials Data Model v2.0" (VCDM 2.0). The latest published version of the specification can be found here.

The specification is a JSON-LD data model specification for Verifiable Credentials and Verifiable Presentations.

All features can be internationalized using JSON-LD features. Specifically, the validity period for a Verifiable Credential is expressed using XML Schema 1.1 where the dates can be localized and made accessible given the nature of XML Schema 1.1 date time values.

The specification contains an Internationalization Considerations section that provides more details on how internationlization is achieved.

The following is a review based on the short i18n review checklist from here:

The specification is a JSON-LD data model specification and can use all internationalization features that JSON-LD offers. The Internationalization Consideration section specifically points out how to achieve supporting different languages, text direction and so on.







Dates and times are used when expressing the validity periods for Verfiable Credentials. For these fields, we use XML Schema 1.1 date-time format, see validFrom and validUntil.


All character encoding for the VCDM use UTF-8 for text encoding. The VCDM vocabulary is hosted by W3C and uses the UTF-8 encoding for its contents (default encoding on the web).



Dates and times are used when expressing the validity periods for Verifiable Credentials. For these fields, we use XML Schema 1.1 date-time format, see validFrom and validUntil.

Advanced features of the VCDM that define extension points such as TermsOfUse can have internationalization considerations but this is out of scope of this specification. It is expected that developers that define concrete extension points or extend the VCDM using the JSON-LD extension mechanism would write and implement their own internationlization considerations.

However, since the specification is based on JSON-LD, all features of the VCDM as well as concrete extensions (even if defined outside of the specification) can be internationalized using JSON-LD using the mechanisms described in the Internationalization Consideration section.


Since the specification is based on JSON-LD, all features of the VCDM as well as concrete extensions (even if defined outside of the specification) can be internationalized using JSON-LD using the mechanisms described in the Internationalization Consideration section.


awoie commented 1 year ago

Requested review from Internationalization Working Group here https://github.com/w3c/i18n-request/issues/212

OR13 commented 1 year ago

Seems blocked pending a completion of review, and all issues related to review have been filed.

@aphillips is your review complete? Have we addressed all your concerns?

aphillips commented 1 year ago

@OR13

This issue is your self-review (thank you for providing it). The I18N review is tracked in our review radar and open issues can be found in the horizontal review board:

As I write this, VCDM has 5 open issues (including this issue). If you feel you have completed your self-review, you can close this issue. You may close other issues if you feel they have been addressed. I18N evaluates our "mirrored" issues periodically and our closing of those issues is what allows you to transition without questions being raised.

Looking at the open issue list, I have just marked w3c/i18n-activity#650 for "close", as you have addressed it. I think we are still awaiting some reply to my most recent comments on the language example thread and the remainder of the issues (other than this self-review) are related to that. Our WG has included the topic of @context in our agenda for our weekly teleconference tomorrow (2023-08-31): we may or may not be satisfied as a result of that conversation. At the moment, I am not satisfied, since I think that name and description without language and direction metadata wasn't our goal, but don't hold me to that pending our internal discussion!!

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-08-30

View the transcript #### 4.2. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155) _See github issue [vc-data-model#1155](https://github.com/w3c/vc-data-model/issues/1155)._ **Brent Zundel:** any actions to take here? **Manu Sporny:** it's not clear. Addison continues to review. So I think we are getting active i18n reviews. … we should probably ask outright if we've addressed his concerns in his review. **Brent Zundel:** we do have on i18n pre-cr issue. > *Orie Steele:* +1 to an example. **Dmitri Zagidulin:** I wanted to ask if there would be any objections or push back to adding a PR with an example of i18n. **Manu Sporny:** we have stuff in the internationalization section. **Dmitri Zagidulin:** it doesn't have value-level language selection. **Manu Sporny:** yes, we had that and people complained, so we removed it. … so maybe we could add one that has the bare @language or @value in there. > *Orie Steele:* maybe file an issue capturing the example you want to see, and cross link for discussion. **Dmitri Zagidulin:** I just mean, for example, if the value is a string, then an object with the properties for lang. **Brent Zundel:** hold on. remember the queue. … an issue to track this concern with an example example, would be an appropriate way to advance this.
shigeya commented 1 year ago

@msporny,

You said:

Manu Sporny: yes, we had that and people complained, so we removed it. … so maybe we could add one that has the bare https://github.com/language or https://github.com/value in there.

Would you please kindly let me know where I can find these complaints?

msporny commented 1 year ago

@shigeya wrote:

Would you please kindly let me know where I can find these complaints?

It was spread across various WG calls and it was years/months ago, I can't remember which calls exactly. The gist of it was that people didn't like doing this:

"property": {
  "@value": "The value",
  "@language": "en-US"
}

and while we suggested that people do this instead, and set their contexts up to do so:

"property": {
  "value": "The value",
  "lang": "en-US"
}

Which is what we do in the current spec, people are skeptical that developers are going to do this too... mostly because VCs to date demonstrate that people are not doing /any/ i18n encoding.

We could do @language in the context, but that won't work for the people that don't want to process the @context beyond just checking a few values. We /could/ say that people doing processing MUST pay attention to @language, but then most programming environments don't have support for i18n (think JSON-only processing environments). The other problem with @language in the @context is that it could accidentally set the language for strings that are not meant to have a language attached to them (like all the JWT properties).

Another option is to introduce a language property to the VC that expresses that "all human-readable strings SHOULD default to this language".

I was hoping that you, @shigeya, would propose your translation strings file concept for inclusion into VC v2.0 using .po files or similar. However, since that hasn't been done, and since we're transitioning into CR very soon now, it's probably too late to add it at this point w/o the WG buying into the concept/idea.

IOW, we can support i18n as long as implementers are careful w/ the creation of JSON-LD Context files... but that requires effort that some implementers are not willing to make.

There are probably other options... we are 8-ish months away from Proposed Recommendation... we need to either go with the solution we have now and see how the market uses it, or try a different option. Having multiple options is going to harm interop... looking for i18n for guidance on this topic.

shigeya commented 1 year ago

Thank you very much for providing the concise summary.

Another option is to introduce a language property to the VC that expresses that "all human-readable strings SHOULD default to this language".

I think that's a reasonable compromise.

If we introduce that option, the language property needs a default value (en-US ?). Also, we need the "dir" property with a default. Then, developers not concerned about language may forget about the property, and some developers who want to specify the language may use the property.

iherman commented 1 year ago

we need to either go with the solution we have now and see how the market uses it

(With my W3C hat down) I would think this is what we should do. This means that, in many cases, the language will be undefined; in my understanding, this is also the case with HTML. I feel uneasy reinventing a new mechanisms with defaults.

Which is what we do in the current spec, people are skeptical that developers are going to do this too... mostly because VCs to date demonstrate that people are not doing /any/ i18n encoding.

What infos do we really have on this? Are we relying on implementation dominated by US (or US and UK) implementers? I would be surprised if European implementers would be just as insensitive to language issues.

shigeya commented 1 year ago

I feel uneasy reinventing a new mechanisms with defaults.

The problem is what we discussing here is the integrity of data.

IMO, data integrity will be difficult to achieve without carefully designed defaults.

aphillips commented 1 year ago

If we introduce that option, the language property needs a default value (en-US ?).

Do not default to US English. If the language is not known, make it unknown. There is a tag for that (und) and a corresponding locale in CLDR (the "ROOT" locale).

Another option is to introduce a language property to the VC that expresses that "all human-readable strings SHOULD default to this language".

I think that's a reasonable compromise.

I tend to agree.

I would commend this group back to our document STRING-META. Pay particular attention to the discussion of syntactic content and the best practices in resource wide defaults.

Thought: isn't this what @language in @context already does? "Normal" JSON processors still see these values. They just don't apply JSON-LD's processing rules.

We could do @language in the context, but that won't work for the people that don't want to process the @context beyond just checking a few values.

Perhaps this could be solved by adopting a different definition of what "working" means? I think I18N would be satisfied if it were possible for a consumer of a credential to recover the language and direction of natural language strings if they desired to do so, but not to require that every implementation do the recovery every time. It is easier to lobby for adoption of proper internationalization when the data is present.

I wouldn't oppose creating a new mechanism, if that will be more effective. I agree that the "pure JSON" case is real and understand why you desire not to require full JSON-LD. But I am leery of having lots of different standards using incompatible mechanisms to solve the same problem.

"Seeing how the market uses it" is not really a persuasive argument to me. Many implementers will be lazy about adopting features, particularly if they don't directly effect the producer. This is like arguing 35 years ago that "I don't need Unicode for my English because I'll never use all those extra bytes" 😸.

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-09-14

View the transcript #### 2.7. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155) _See github issue [vc-data-model#1155](https://github.com/w3c/vc-data-model/issues/1155)._ **Manu Sporny:** Let's clarify that normatie statements for use cases docment is requirements on VC Data Model. � there was a conversation about this issue because other issues track those concerns more directly, then more conversation happened here. � so what needs to happen to say that we have addressed this issue. **Sebastian Crane:** this is a traffic issue so its convenient to keep it open. _See github issue [vc-data-model#1264](https://github.com/w3c/vc-data-model/issues/1264)._ **Manu Sporny:** i agree with Sebastian. The other data point is that conversation started here, but moved to 1264. � specifically that issue. We do have an outstanding concern. � Namely what are we telling people to do about language strings. � I'll put a [link to the options](https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665). � we can close or open. This issue: just one more item we need to resolve before CR. > *Shigeya Suzuki:* +1 on keeping this issue open. **Manu Sporny:** the way we sorted the issues we have a PR, but it doesn't address all the language options. � so we need to talk about this still. **Brent Zundel:** if we need to talk about this in this phase, we should have it now. **Manu Sporny:** thanks. The internationalization group asked us to specify how a default language for a document is specified. � we responded by saying we have name & description in the base context, so we can use that as an example. � we explain that in the spec today. > *Manu Sporny:* [https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665](https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665). **Manu Sporny:** They came back and said that he felt uncomfortable because we were not specifying a default language for the VC. � That led to different proposal, each with different tradeoffs. � options A, B, C, D, and E. � I don't think we have time to go over all of these today. � I think what we are doing in the spec is the best that we can do. � but that depends on what the i18n group feels. > *Dmitri Zagidulin:* what is the difference between options A and C? **Brent Zundel:** I don't know that we can avoid talking about the options. **Sebastian Crane:** thank you manu for creating the issue with the clear options. � option C is what I proposed a resolution for. � I think we are close to consensus on this issue. **Manu Sporny:** I can go through the options ... **Brent Zundel:** since Sebastian thinks C might be a winner, let's start there. **Manu Sporny:** Option C uses JSON-LD language features. � `@value` for value of a string. � `@language` and `@direction`. � benefit is already in JSON-LD. � drawback is that people who don't like JSON-LD might not like this option. � So we need to hear back from people who want to use something else. � also Option C doesn't set a base language for the document. That's not clear if the international WG will go for that. **Ivan Herman:** I have a comment on the tech. but a practical comment first: Addison is around, so let's try to talk to him. **Brent Zundel:** That's right. We can take advantage of TPAC. **Ivan Herman:** on the technical side, the title is misleading because the JSON-LD language features are more than what is in option C. � we can use JSON-LD features the way its used in 1.0 and we can set the direction for the whole file if we want (using JSON-LD). � But I think we should not spend to much on this. JSON-LD has gone to great efforts to work out language with the i18n group. > *Pierre-Antoine Champin:* [https://www.w3.org/TR/string-meta/](https://www.w3.org/TR/string-meta/). **Ivan Herman:** So we should not cherry pick. � There are not thousands of ways to do that in JSON either. We may have a long conversation (with some beer) wether to include an `@` sign or alias that out. � That's the only question for me that's really relevant (the aliasing). � We know (from HTML) in many actors in countries where people will ignore these language features. � That's life. **Sebastian Crane:** I'd like to agree. Two things: we are providing the option to do language support correctly. We can't force it, but we can enable it. � second, the idea of a JSON-LD only idea. There's nothing in JSON-LD that requires "full JSON processing" for these language features. � So for those here with no interest in RDF, this shouldn't add any complexity. **Manu Sporny:** this comes about because some implementers look at the @sign and freak out. > *Dmitri Zagidulin:* well so wait, why dont we alias out the `@` sign? **Manu Sporny:** so if no one is complaining about that, we can just adopt it. � If we alias out the `@` sign and we apply that against all VCs everywhere, then nobody can have a property named "language", which is prolematic. � The other thing.. Ivan said we could just depend on the JSON-LD properties. � there are examples where that would clearly be wrong. � if we allow `@language` in the context, e.g., `@langauge="es"` at the top. That would apply that language to every text string in the document, including base64 encoded values, etc. � so we need to provide guidance that doesn't lead to meaningless decoration. � If people are good with `@value`, `@language`, and `@direction` we're good. Aliasing is ok, but not great. **Brent Zundel:** If we were to proceed as mentioned, if we go with those `@values`, is there anyone who would be opposed to that? � I'm not seeing any opposition, so I think this is read for PR. **Ivan Herman:** JSON-LD scares the hell out of people sometimes because they are nervous about RDF. I have to emphasize what is in JSON-LD for the language has nothing to do with RDF. � The features themselves, these are generic features that can be used for ANY JSON vocabulary. � No magic or hidden RDF. � We can haggle around the `@` sign. � Personally I prefer keeping it, but that's me. � we have done that for id & type. **Dmitri Zagidulin:** re: alias. I'd argue we have a better way than flipping a coin. We know there is signifcant pushback on @s. � can we have a poll. **Manu Sporny:** two things. If we decide to use JSON-LD keywords, we'll have to change the way name & description work. � two: I'm concerned about aliasing "value". I'd feel better if we had that for a while, I'd feel better. > *Andres Uribe:* Is it possible to alias `"lang_value"` to `"@value"` ? **Manu Sporny:** I do agree with dmitriz that there is an allergic reaction to seeing `@` signs in JSON. � Don't think its an easy answer. We should be ready to trigger another CR later. **Ivan Herman:** I forgot to react to Manu about setting global language. From the JSON-LD point of view, that's not really a problem. Because we can specific that language doesn't apply for the datatype in the range of that specific property. � for JSON only users they would ignore it. > *Brent Zundel:* POLL: we will use keywords `@language`, `@direction`, and `@value` for language and alias them to 'language', 'lang_direction' and 'lang_value'. > *Andres Uribe:* +1. > *Gabe Cohen:* +1. > *Dmitri Zagidulin:* +1 (though would much prefer 'direction' and 'value'). > *Sebastian Crane:* +1 for one option, +1 for the other. I think that counts as abstaining, but I will definitely not oppose either option :). > *Ted Thibodeau Jr.:* +0.5. > *David Chadwick:* +1. > *Ivan Herman:* +1 (like dmitriz). > *Joe Andrieu:* 0. > *NickLansleyGS1:* +1. > *Manu Sporny:* +0.5 (with severe trepidation wrt. stomping on existing data models out there) -- also, language/direction/value (not what was mentioned). > *Shigeya Suzuki:* +1. > *Phil Archer:* -1. > *Paul Dietrich:* +1. > *Juan Caballero:* +1. > *Jay Kishigami:* +1. > *osamu-n:* osamu-n has joined #vcwg. > *Dmitri Zagidulin:* can Phil and Manu explain why dangerous? and what holes? **Phil Archer:** I agree with Manu's comments, it's dangerous. but my participation here is minimal, so I understand. � this could impact things in unintended ways. > *Dmitri Zagidulin:* 'id' and 'type' are also very common words. **Phil Archer:** using a common word like value to mean something that other people don't use it for. > *David Chadwick:* I would be -1 if the alias was 'value' and not 'lang_value'. **Brent Zundel:** clarification the proposal is for lang_value and lang_direction, not "value". > *Andres Uribe:* ditto to what DavidC said above. **Phil Archer:** Ah... that's much better. > *Paul Dietrich:* agree DavidC. > *Manu Sporny:* also, let's not use underscores since none of our other properties have underscores :). > *Manu Sporny:* langString <-- would be better. **Sebastian Crane:** The aliasing of `@` to something else simply makes the `@`sign is implicit. **Manu Sporny:** ivan said some stuff I didn't understand. I'd like to. > *Dmitri Zagidulin:* +1 camel case vs snake case. **Manu Sporny:** If we alias to lang_string, lang_direction, then that's probably fine. � it would be nicer if people didn't get all bent out of shape about `@` signs. > *Ivan Herman:* +1 to camel, to be aligns with the style used for other properties in VCDM. **Manu Sporny:** Also, what Phil said: there's a lot of data out there that already uses value and it would trigger a lot of confusion. � The problem is `@value` means something more than just language. � That means ... it's not a straightforward decisions. **Brent Zundel:** we can keep going it. **Sebastian Crane:** I'm happy to help with PR. > *Pierre-Antoine Champin:* manu, people can still use `@value` when `lang_value` does not make sense. But you can't always prevent them to shoot themselves in the foot if they want to. **Dmitri Zagidulin:** to clarify, if we alias value to something else. It's the other direction. Aliasing from lang_value to @langauge, but you can still use @language elsewhere. **Manu Sporny:** if you compact using the VC context and you have @value throughout your JSON-LD, all those @value will be changed. **Ivan Herman:** if you make the alias an embedded context for a property. � for every property that is potential language and you scope the context. � That's what we suggesting is to not make it scoped. � Option C is global, unscoped. > *Pierre-Antoine Champin:* oh, yes, compaction will mess it up :-(. > *Dmitri Zagidulin:* so why not use option A? **Ivan Herman:** so for the other properties, you can make it null. **Manu Sporny:** we need a deeper conversation and I don't know if we can get through this in 20 minutes. > *Manu Sporny:* option A /is/ what we're doing in the spec today :). **Brent Zundel:** this is a before CR issue. Is that because there will be normative changes to the spec? > *Dmitri Zagidulin:* ok. and what is the problem with it today? just the `@` signs? **Manu Sporny:** yes. this has normative impact. **Brent Zundel:** so we will add time for this during TPAC. � Break for lunch! Back in 80 minutes with or without Brent. > *Manu Sporny:* no, it doesn't provide a language for the entire VC and it requires context authors to do scoped language stuff. > *Dmitri Zagidulin:* `@manu` - what do you mean by scoped language stuff? what will it require them to do? **this:** [https://github.com/w3c/vc-data-model/blob/main/contexts/credentials/v2#L122-L127](https://github.com/w3c/vc-data-model/blob/main/contexts/credentials/v2#L122-L127). > *Manu Sporny:* (which, btw, I think is fine ^). > *Dmitri Zagidulin:* interesting.
Sakurann commented 1 year ago

@aphillips would you and other Internationalization WG members be able to join VC WG to discuss this issue (and also its relationship with Issue #1264) during a VC WG's special topic call next week Tue? call details are here: https://www.w3.org/events/meetings/f6342df0-f7b5-4fc9-babd-61e55dc5fc2f/20231003T110000/

aphillips commented 1 year ago

I would be glad to attend. Please add me to the invite. I will ask others in the WG in our teleconference on 2023-09-28.

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-09-26

View the transcript #### 2.6. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155) _See github issue [vc-data-model#1155](https://github.com/w3c/vc-data-model/issues/1155)._ **Kristina Yasuda:** This is an internationalization one. … We keep adding / removing "ready for PR". … So we discussed at PR how we would address this. We had a poll. … We didn't assign anyone and we still don't have ready for PR. > *Manu Sporny:* See [https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665](https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665). _See github issue [vc-data-model#1264](https://github.com/w3c/vc-data-model/issues/1264)._ **Manu Sporny:** Yeah, this is also tied in with issue 1264. There kinda/sorta duplicates of one another. … I'm worried about this one ... I think we need the i18n people in one of our meetings and we need to talk with them, back and forth, need to avoid doing something they would object to. … Assigning a language for the whole VC is a problem and we don't want to do that. … Addison has responded with something where he's basically saying, we have a number of options we've proposed that satisfy their requirements but it's not clear what the best one they'd like. We should bring them in to talk with them about it before moving forward. > *Manu Sporny:* [https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1719022289](https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1719022289). **Manu Sporny:** Let me link to Addison's response in IRC. … He's basically saying, this is what the i18n WG is looking for and there's some MUSTs/SHOULDs/MAYs ... and he analyzes each option that is above, noting that there are a couple there ... just about every option except the last one satisfies what they want but it's not clear which they'd want. … It's not clear how much of a hard line they are taking here on any approach. I'd like to get them on a call so we can just say once and for all what we're doing and then move on without worrying about any objections during transition. **Sebastian Crane:** A few weeks ago, we had a call and I proposed a resolution, we didn't get to voting on that. The initial reception was unanimous reception within this WG, so I think the only thing to do is get the i18n people involved. … There isn't much left with that issue then. … It would just be implementation from then on. **Kristina Yasuda:** Thanks. Quick question -- how is not using `@language` in `@context` aligned with using `@language` keyword for i18n? **Manu Sporny:** They are not aligned. … The i18n are saying: They want a document level default and I don't know how hard of a line we have on that and then our only option is going to be using `@language` in `@context` and that's got problems. … JSON-only processing is more difficult and it will tag values that are not supposed to have languages like base64 values with a language tag. … So, during the F2F we were saying be surgical, use the `@language` and `@value` and `@direction` stuff. … We also said, maybe we'll alias that, but people came up with reasons we shouldn't alias. … So I think what seabass said was to just use the `@` language features in a targeted way and we just need to find out if i18n people would be ok with that approach. **Sebastian Crane:** I would like to expand on that, I'm not a member of i18n WG at the moment. There's a technical reason not to do global language but there's also a reason that it's philosophical reason that it's not good, "you can enter" is the same meaning no matter what language you say it in. … They are not just simple language documents. When you're using JSON-only processing you may not get to use those advanced RDF feature.s. … Having the language translation features within the properties themselves is more elegant, you're not translating the credential itself. > *Manu Sporny:* agree with seabass. **Dave Longley:** +1 to seabass. > *Phillip Long:* +1 to seabass2. **Kristina Yasuda:** Ok, I will reach out to set up a meeting with i18n.
Sakurann commented 1 year ago

I would be glad to attend. Please add me to the invite. I will ask others in the WG in our teleconference on 2023-09-28.

done! is tomorrow still good for others from your WG to attend?

aphillips commented 1 year ago

@Sakurann Yes, we're good.

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-10-03

View the transcript ### 1. Internationalization WG review. **Kristina Yasuda:** Special meeting due to feedback on Internationalization. … Existing options had not been decided on. **Kristina Yasuda:** Please introduce yourself Addison. _See github issue [vc-data-model#1155](https://github.com/w3c/vc-data-model/issues/1155)._ _See github issue [vc-data-model#1264](https://github.com/w3c/vc-data-model/issues/1264)._ _See github pull request [vc-data-model#1271](https://github.com/w3c/vc-data-model/pull/1271)._ **Addison Phillips:** I'm the chair of the I18N group at the W3C. > *Kristina Yasuda:* five options? [https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665](https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665). **Manu Sporny:** The background of this: we've had guidance about supporting internationalization, with a design pattern for people to follow. In the 1.0 and 1.1 work, we haven't seen much adoption of the I18N features. … For 2.0, we are adding two fields expected to be multilingual. > *Manu Sporny:* Here are the potential options that we're considering: [https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665](https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665). **Manu Sporny:** We had a number of options to consider for how to do it in 2.0. … We're looking to get to consensus on the option that we should choose, and one that will satisfy the I18N group. **Addison Phillips:** I had read through the summary of the discussion and will summarise here to ensure that we have a common understanding. … In general, the I18N Group would like to see that any natural language string field has metadata about at least A: language and B: text direction. … We aren't prescriptive about how that is performed. We also like to see a default language for documents. > *Orie Steele:* It would be good to get a stronger opinionated recommendation regarding approaches. > *Kristina Yasuda:* we kind of have 1.90 foot in LD world. > *Dave Longley:* I think we have both feet in LD world, we just want to use the simplest on-ramps. > *Dave Longley:* and adding `@value`, `@language`, and `@direction` in fields locally is the easiest way to do that. **Addison Phillips:** I think it sounds like being somewhat in the Linked Data world as well as a more general specification produces some complications. … We would like to understand more the concerns around global @language directives, because we are wondering whether these concerns apply to the wider LD community. … From what we've learnt so far, the I18N group is trying to produce best practice recommendations to other groups. … One of the ones that we've already been working on is quite different from your approach. I would like to share it with you. **Manu Sporny:** On the topic of being both in the LD world and not, it seems like a subset of our community are less likely to adopt the specification when LD features are added. … We've tried to reduce the LD features to a minimum up until now. > *Orie Steele:* -1 to asserting that "avoiding using LD is a possibility at this point". > *Dave Longley:* -1 that we can / are avoiding it, +1 that we're choosing the easiest on-ramps. > *Orie Steele:* -1 to being vague about understanding conforming documents (which are JSON-LD in compact form). **Manu Sporny:** As for Option E (using a translation file), I think you mentioned that you were against it, and there seems to be agreement within this WG. **Addison Phillips:** I would not object to translation files per se, but I would point out technical complications about multiple requests and resources. I think that doesn't sound like the right pattern for credentials. **Manu Sporny:** Option E can eliminated then! … For option D, I don't think this option has any advantages over using the LD method of a global @language, which is effectively the same in effect. **Addison Phillips:** It is common for us to recommend that specifications do this. It would be better if there were generic mechanisms, but specification-specific fields are OK. > *Dave Longley:* -1 to option D. > *Dave Longley:* -1 to option E. > *Shigeya Suzuki:* For the record: option E is externalization, and it will not be possible unless we define internal way to express it IMO. > *Dave Longley:* +1 to eliminate E. **Kristina Yasuda:** Are there any objections to eliminating option E? > *Andres Uribe:* +1 to eliminate E. > *Manu Sporny:* +1 to eliminate option E. > *Shigeya Suzuki:* I'm fine with eliminating option E for now.. > *Phillip Long:* -1 to option E & D. > *Joe Andrieu:* +1 to eliminate E. > *Manu Sporny:* +1 to eliminate option D (but keep it around as a backup plan). **Addison Phillips:** I would suggest that you could keep it for a 'backup'. **Andres Uribe:** I think that's the default if we can't get consensus of anything else. **Sebastian Crane:** I wanted to mention option E with the translation files, sometimes they look really good on paper, but in practice, lots of complications. … networked translations, even when installed on computer, there are still lots of issues, GNU style translation -- .pot files -- translates based on literal value of string, but as linguists say, there are cases where you can have same words which mean two semantically different things and language files as used in GNU world don't have opportunity to disambiguate those. Number of complicates here with Option E. > *Shigeya Suzuki:* I don't want to spend time time on this, but the way gettext/po used is studied well and in some non-english area esp. in CJK area, it's useful. > *Sebastian Crane:* I agree with addison: they can be used correctly but I think that is unlikely to be the case in the VC world. **Ivan Herman:** Option D means having two properties: language and text direction. We need both on the default level. … Is this the general view as well? > *Manu Sporny:* We do need to express language AND direction. > *Manu Sporny:* yes, that's the general agreement, I believe. **Addison Phillips:** Indeed, I agree we would need to have this. > *Manu Sporny:* Here is option C:. ``` "credentialSubject": { "myHumanReadableProperty": [{ "@value": "This is some human-readable text.", "@language": "en" }, { "@value": "هذا بعض النص الذي يمكن قراءته بواسطة الإنسان.", "@language": "ar", "@direction": "rtl" }] ``` > *Dave Longley:* +1 to this option (C, I believe), i think it's the simplest and will work generally for any natural language field. **Manu Sporny:** We would express the value of the string, the language, and the text direction. I believe this meets the requirement that addison illustrated. > *Dmitri Zagidulin:* +1 to option C, works well for multi-language credentials in Edu land. > *Phillip Long:* +1 to Option C, as it does indeed work well in edu-land. **Manu Sporny:** We've mainly been discussing whether to alias the "@X" terms. Sebastian proposed options C on two occasions, and there were no objections raised. I believe it addresses all concerns except for the ability to specify a global default. > *Sebastian Crane:* +1 to Option C obviously :). **Manu Sporny:** It's not easy to test this, as multiple languages are optional. > *Manu Sporny:* +1 to speaking to Option C in the specification. **Addison Phillips:** I'm concerned that whilst the 'SHOULD' and 'MAY' are good supports for internationalisation, there will still be completely unlabeled strings. … I would like it to be possible to know a default, for when people don't want to put all the extra syntax in. **Manu Sporny:** Can we ask if the group is OK exploring option C further? **Kristina Yasuda:** If anyone is strongly in objection to option C, please speak now. **Sebastian Crane:** To repeat for addisons' benefit -- VCs don't inherently have a language... language on field such as name/description is for human holder of VC in a wallet application. When you have the RDF world, link things together based on ontological truth, actual meaning... … you don't necessarily want to apply a language to the specific credential, you want to apply language to description of credential... that's why I like option C -- translate those human-readable values, credential itself doesn't have a language. > *Orie Steele:* Conforming documents are represented in JSON-LD.... the philosophical concept of credentials is not helpful... JSON-LD will have text that is in a human readable language (both the term definitions, the text behind them, and their literal values). **Addison Phillips:** Yes, important observation, locale-neutral data... when people talk about these things, name/description is how humans interact... can't look at other things and talk meaningfully about them... credential has BS of science -- those names/descriptions are of natural language pieces... want natural language to be associated with those parts, not other data. … challenge is that machine generates these things, people writing code may or may not be willing to generate multiple language versions, or they may not wish to obtain and serialize information on per-field basis... if you're willing to say MUST, then we're good. … I think that's an important distinction: one wants to have language-neutral data if at all possible. A complication is that humans can't talk meaningfully about the pure data, only about the natural language descriptions. **David Chadwick:** I think MUST is fine, but not sufficient. Let's say you have a degree from a Japanese university and has language metadata, that degree credential is still not readable by a typical English person. … I believe that C is necessary, but doesn't completely solve the internationalisation concerns. **Andres Uribe:** I'm definitely supportive of Option C. In addition, I would like to see aliasing. I don't really understand why aliasing will cause problems with JSON-LD, so I would appreciate an explanation here. > *Dave Longley:* aliasing `@value` will alias it for everything, not just language values. **Manu Sporny:** The short answer is that `@value` is also used for non-natural-language fields. We can't just aliases it globally without making other fields have unwanted language features. > *Dave Longley:* so making it say `langValue` (or whatever) for non-language values will be weird / confusing. > *Orie Steele:* The comment about "re-compacting" / "compacting" is critical for the WG to understand. **Manu Sporny:** For that reason, I would be strongly opposed to aliasing if option C can be sufficient. It is only a single character difference. > *Orie Steele:* I'm not sure that there is understanding here... and we should clarify. **Andres Uribe:** Thank you, that answered my question. **Manu Sporny:** We would end up getting our alias appearing in unexpected places. **Kristina Yasuda:** I would like to ensure that other options are considered as well and we are running out of time. > *Manu Sporny:* I'm afraid that we're not going to be able to get to "MUST always use `@value`/$MD_CODE$/$MD_CODE$" when expressing human-readable strings. **Kristina Yasuda:** Let's discuss option B and A. > *Dmitri Zagidulin:* +1 manu. **Manu Sporny:** The suggestion I'm hearing is to remove option A. It just allows us to use 'prettier' values, but doesn't have any advantage. > *Orie Steele:* If folks don't understand "compact vs non compact LD"... they don't understand what a conforming document is.... so we should be cautious requiring "non compact" processing of languages, because they spec does not require people to understand that. > *Dave Longley:* if option B is putting `@language` as a default language in the context then -1 to that, it corrupts the data. **Manu Sporny:** Option B provides a document-level default. The issue that we would need to flag is non-natural-language fields being classed as a specific language, such as Base64 data being marked as natural language. … We could make Option B a fallback to Option C, but that has downsides for the JSON-LD context architecture. … I believe the options here are Option B+C - OR - Option C+D. **Addison Phillips:** You can't prevent people from serialising `@language` globally. You could deprecate that behaviour of course. > *Orie Steele:* IMO, if you can't stop people from doing something, its considered best practice to give them guidance... and not be silent. **Sebastian Crane:** I'd like to talk about Option C only. … This is an implementation consideration, authenticate users, use existing libraries -- if those tools made it as easy to set a default in the code and have the serialization of fields automatic, as writing the serialized language feature at the top, then people would use that feature. In contrast to HTML, people were hand-writing it... but due to cryptography involved, people aren't hand-writing VCs. Lack of global language feature could. > *Manu Sporny:* be side-stepped in implementations. > *Kristina Yasuda:* I do not like the idea of let's rely on the library to implement this correctly.. > *Shigeya Suzuki:* +1 kristina. **Ivan Herman:** We could argue the same thing about HTML, as few people write HTML by hand. What's the proportion of tools that produce linguistically undefined documents? Perhaps addison knows. > *Shigeya Suzuki:* It's depends on complexity of the output. for a simple VCs, it's not necessary to depends on huge library. not all people have freedom on memory and energy usage. **Ivan Herman:** Maybe putting the language metadata in all fields is a bit naive. … I am not particularly partisan to the technique, but I think it's important to have something for global language. > *Ted Thibodeau Jr.:* can we globally say "language: undefined" or "language: various" or similar? **Addison Phillips:** To Sebastian's point, if you say that it's a MUST and all the libraries implement it, maybe it would be moot. I'm not sure that you could expect that response without a MUST. **Kristina Yasuda:** can everyone put your favorite option on IRC. > *Kristina Yasuda:* C+D. > *Ivan Herman:* C+B. > *Dmitri Zagidulin:* C+D for me. > *Sebastian Crane:* C. > *Dave Longley:* -1 to C+B because it corrupts all string fields that are not natural language fields (which is a common thing in VCs). > *David Chadwick:* c+d. > *Manu Sporny:* C+B. > *Phillip Long:* C+B or if we're ranking 1, C+B, 2, B+D, 3, C. > *Dave Longley:* +1 to just C, I don't think there's a significant difference in MUST/SHOULD with C and having a default language with D, the people that don't want to do it won't do it either way -- and only tools will stop them. > *Dmitri Zagidulin:* wait D is in the VC core context or in the VC itself? > *Joe Andrieu:* +1 to C. > *Manu Sporny:* D is in the VC itself. > *Kristina Yasuda:* D is in VC itself I think. > *Shigeya Suzuki:* I think slightly C+B better but C+D is also acceptable. > *Dave Longley:* that's an insufficient description of B and D ... **Dmitri Zagidulin:** can we remind people of the difference between D and B? **Manu Sporny:** D creates a new feature, B uses an existing JSON-LD feature. > *Dave Longley:* B will use a JSON-LD feature that will apply a language to EVERY string field, even non-language fields. > *Dmitri Zagidulin:* in that case, C+B. > *Orie Steele:* I think I agree with what dimitri is saying though... > *Dave Longley:* D will invent something new for VCs but only apply it to natural language text fields. > *Orie Steele:* ^ yeah... that. > *Phillip Long:* Ori is up. **Ivan Herman:** dlongley said that every field will get a language tag with B. However, if we had LD tags for datatypes, that won't be an issue. > *Dmitri Zagidulin:* agree with ivan. **Ivan Herman:** It's not as bad as it looks considering the existence of those JSON-LD datatypes. > *Dmitri Zagidulin:* an app somehow interpreting language & direction on a base64 string or whatever is /not/ a realistic problem. **Dave Longley:** I agree with you that we should use datatypes, but the JOSE and COSE parts do not have data types defined. > *Orie Steele:* also not necessary... to do... because the data model is COMPACT JSON_LD !!!! **Sebastian Crane:** It is a bit involved, I'll write to the mailing list, can we delay a vote for day or two to engage w/ email. **Kristina Yasuda:** We appreciate Addison's time. Thank you! ---
David-Chadwick commented 1 year ago

Before this issue is closed can you point to the PR or Issue that is dealing with the default I18N for a VC, as I could not see it

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-10-18

View the transcript #### 2.3. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155) _See github issue [vc-data-model#1155](https://github.com/w3c/vc-data-model/issues/1155)._ **Brent Zundel:** This issue tracks that i18n review happened. We had a meeting w/ i18n folks. **Ivan Herman:** I think there is still a PR coming wrt. i18n issues in the document. The setting of default language/direction. … That should be put into a PR and eventually merged, when that's done, we can close this. **Manu Sporny:** we are tracking that issue, separately. > *David Chadwick:* +1 ivan. **Manu Sporny:** I don't recall when we can close horizontal review issues. … review happened, other issues were created, i think we can close this. … the items they raised are tracked in other issues. **Brent Zundel:** Thanks, that's my understanding as well. I'm going to close this issue after minutes are generated. > *Phillip Long:* Just a note that there are 4 "applicable" items in the internationalization review. **Brent Zundel:** Ok, end of call, thanks everyone. … Reminder that the chairs are going to become considerably more aggressive determining when we don't have consensus on PRs. Our timeline is overdue. ---
aphillips commented 1 year ago

Noting this comment in your meeting minutes:

Manu Sporny: I don't recall when we can close horizontal review issues.

Your working group may close horizontal issues when you feel you have addressed them--they are "just issues" and don't require special treatment. Please do not remove the *-needs-resolution or *-tracker labels (these are consumed by tools). Your closing an issue is a signal to the horizontal group that you think you're done with it.

Note that we have tracking issues in our own repository (in I18N's case this is here) and you can view the status of your horizontal review in this tracker (filter it by your spec(s)). I18N needs to close any of our matching issues before your transreq will be approved.

msporny commented 1 year ago

Before this issue is closed can you point to the PR or Issue that is dealing with the default I18N for a VC, as I could not see it

It's being tracked in Issue #1264, specifically, this comment contains all the ways we're considering addressing the issues raised by i18n WG's review:

https://github.com/w3c/vc-data-model/issues/1264#issuecomment-1712807665

(edited to change PR #1264 to Issue #1264)

iherman commented 1 year ago

Reopening for proper minute processing

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-11-01

View the transcript #### 2.3. Internationalization Review for VCDM 2.0 (issue vc-data-model#1155) _See github issue [vc-data-model#1155](https://github.com/w3c/vc-data-model/issues/1155)._ **Brent Zundel:** this is the generic internationalization. Now that it points to 1264, we can close this one. Do I remember correctly>. … 1155 is now closed.