w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
298 stars 106 forks source link

reconsider `@id` for `mediaType` term #1408

Closed gobengo closed 8 months ago

gobengo commented 10 months ago

Context:

Questions

Straw Man

gobengo commented 10 months ago

Looks like this was added in this PR https://github.com/w3c/vc-data-model/pull/1296 by @iherman. wdyt @iherman ?

iherman commented 10 months ago

Just to make it clear for other reviewers in the WG:

we are talking about making a choice, in the VCDM vocabulary, between two externally defined terms for our internal use, namely:

  1. The schema.org term encodingFormat; or
  2. The activity streams' term mediaType

Our usage of the term is defined in the VCDM spec that uses the term mediaType, mapped (via the @context mechanism) to the schema.org term. In other words, so far our choice was (1) above. The usage of the term within the VCDM is related to the integrity control of related resources.

Back to the original question of @gobengo: we made a decision in the past of relying on externally defined terms whenever possible, hence our usage of several schema.org terms in our data model (instead of redefining them locally). Technically, we could decide not to use a schema.org term but, instead, map the value to the activity stream version (i.e., switch to choice (2) above); that would affect the VCDM only superficially. Also, taking into account the fact that this term has been introduced into the model relatively recently, I do not think it has any backward compatibility issue with deployed applications. On the other hand, doing so would bring some inconsistency in the vocabulary; that is why, taking into account the widespread usage and importance of schema.org, I am a little bit reluctant to do the switch. But the choice is not mine, but the WG's; if the WG decides to switch, I am happy to accept.

However. Making one step backward here for a moment: I wonder whether this may not be a more general issue that we may want to look at first. The use case of @gobengo is interesting in general: combining a well established Linked Data vocabulary with the VCDM. The fact of having a clash with the term names is bound to happen again. While the case of mediaType is relatively easy to solve (it is only a matter of a small change in the @context file, as shown in the straw man proposal), this may not be true for other cases which directly clash with the core vocabulary. I am curious to see whether experts would give an answer to https://github.com/w3c/json-ld-syntax/issues/424 but, if there is no easy solution, I wonder whether the systematic usage of @protected will not become a general problem in the future. Maybe we will be forced to consider some alternatives. But I would leave this discussion to those who are much more versed in the intricacies of @context mappings, like @dlongley, @BigBlueHat, @pchampin, or @gkellogg…

dlongley commented 10 months ago

@iherman,

I provided an answer at the original issue.

My view regarding "wider questions" is: the fact that some contexts that predate the usage of @protected exist and are often used with unauthenticated data should not drive best security practices for authenticated linked data. Rather, new contexts should be created that follow best practices (e.g., use @protected and scoped contexts) -- especially if the data is intended to be secured (e.g., using VCs) and used by "type-specific" consumers that require particular contexts and JSON structure.

dlongley commented 10 months ago

Note: If we feel strongly about avoiding conflicts in particular with the mediaType definition we provide in the VC v2 context, we could type-scope it and require a type of ExternalResource (or similar) to bring mediaType into scope. This, IMO, is the right way to avoid conflicts -- if we feel that this one in particular is something we should be avoiding.

iherman commented 10 months ago

Thanks, @dlongley. If your solutions work and are acceptable to @gobengo, then we can indeed close this current issue.

Two remarks, however:

[…] new contexts should be created that follow best practices (e.g., use @protected and scoped contexts) -- especially if the data is intended to be secured (e.g., using VCs) and used by "type-specific" consumers that require particular contexts and JSON structure.

We have to acknowledge that this is non-trivial for designers, and it requires a more elaborate knowledge of how @context works. I wonder whether we will have some place where these implementation advices could be documented. I think it would be important to do so. (cc this to @brentzundel to possibly come back to this issue during CR phase. Should we open a separate issue with this?)

If we feel strongly about avoiding conflicts in particular with the mediaType definition we provide in the VC v2 context, we could type-scope it and require a type of ExternalResource (or similar) to bring mediaType into scope.

I am not sure how this would be possible. We deliberately chose to keep the domain of digestSRI open (and mediaType is the "adjacent" term). If we went down the line you propose, it may become difficult to properly use constructions like Example 27: I presume it would require to do something with the image property in the @context file. But, I presume, that is an application-specific term...

(But, again, you may have some extra @context trick up in your sleeve…😀)

msporny commented 10 months ago

@dlongley wrote:

I provided an answer at the original issue.

Here's the link to the answer: https://github.com/w3c/json-ld-syntax/issues/424#issuecomment-1874406508

Is anyone super attached to defining mediaType in terms of https://schema.org/encodingFormat?

As Ivan noted, the use of mediaType is "new" so changing it at this point wouldn't be a big deal, but changing it for the reasons proposed above might be a bad idea.

Could we entertain the notion of defining it by re-using as2 mediaType?

The question in the group will become: "Why is Activity Streams, which has far less usage than schema.org, defining mediaType? Why isn't Activity Streams just re-using schema.org encodingFormat, instead?"

Another option could be that the AS3 (the next version of the context) would change/update to use schema.org.

I don't have a strong opinion one way or the other on this issue, but I am hesitant to change things w/o more implementer feedback. How many implementers are planning on using VCDM v2.0 w/ AS2. If this is an experiment at this point, with no production-track implementations doing the above, then it's going to be a hard sell to the WG.

We are attempting to go into CR this coming week with VCDM v2.0, so any kind of "hail mary" type change at this point would probably be frowned upon. I will note that we can change the context file during CR (we explicitly mention that we might do that in the spec).

So, the options open to us in "least disruptive" to "most disruptive order are:

  1. AS2 doesn't work w/ VC2, we need a new AS3 context for that to work, which will start using schema.org's encodingType.
  2. AS2 works w/ VC2 because we update the VC2 context to use AS2's mediaType URL.
  3. AS2 works w/ VC2 because VC2 and AS2 aggressively scopes the mediaType property in the AS2 context.
  4. We rename VC2 mediaType to encodingFormat.

I'm marking this as post-CR since we can "fix" this in CR and will need more time to discuss on the proper path forward here given that we're trying to transition VCDM v2 into CR this coming week.

gobengo commented 10 months ago

@msporny sounds right. options 3 or 4 would be ideal. Moreso even than option 2 which is what my original post suggests only because I didn't think of options 3 or 4.

iherman commented 10 months ago

I presume that the easiest solution for all of us is option (4), unless AS2 does that change for the mediaType property. I would be fine with it (it may reduce future clashes, because the term "media type" may be more widespread).

I am happy to come up with a PR if we have a general agreement. Do we?

msporny commented 10 months ago

I presume that the easiest solution for all of us is option (4), unless AS2 does that change for the mediaType property. I would be fine with it (it may reduce future clashes, because the term "media type" may be more widespread).

Unfortunately, I don't think any of the options are "easy". The problem is that both AS2 and schema.org have claimed two different ways of expressing media type and now VC2 is in the unenviable position of having to choose one or the other, or do something weird (like start using the "encodingFormat" term, which is not readily used in the industry).

In VC2, we really do mean media type w/ our property... so I don't think renaming it to "encodingFormat" is the right way to go.

I am happy to come up with a PR if we have a general agreement. Do we?

No, unfortunately, I don't think we do. I think we need to be very careful about what we do here, none of the options are as straightforward as they seem. Yes, some of them are "easy" in that they don't require us to think too hard about the repercussions, but there will be repercussions.

I think the next step is to list "Pros" and "Cons" for each approach so we can do a full analysis w/ the VCWG engaged. Let's not do something hasty here as mediaType is fairly critical to a variety of VC2 use cases.

iherman commented 10 months ago

I presume that the easiest solution for all of us is option (4), unless AS2 does that change for the mediaType property. I would be fine with it (it may reduce future clashes, because the term "media type" may be more widespread).

Unfortunately, I don't think any of the options are "easy". The problem is that both AS2 and schema.org have claimed two different ways of expressing media type and now VC2 is in the unenviable position of having to choose one or the other, or do something weird (like start using the "encodingFormat" term, which is not readily used in the industry).

Just for the sake of arguments: for some reason (maybe being faced with a similar dilemma), the schema.org vocabulary creators decided to use the encodingFormat term, and industry survived by using this. After all, what our @context file does is to map the term mediaType to schema's encodingFormat, i.e., your option (4) in https://github.com/w3c/vc-data-model/issues/1408#issuecomment-1879778435 is simply to say that we fully embrace the usage pattern of schema.org instead of hiding it (which is what we do now).

I do agree that this is a VCWG's choice, though. A PR can come before doing that (to facilitate the decision) or after it.

TallTed commented 10 months ago

schema.org has redefined a great many things, or at least ignored common use and understanding, without much if any apparent concern for any repercussions. Something about being backed by multiple 800lb Gorilla organizations seems to encourage this.

Several IANA media types may be encoded in multiple ways. For instance, text/plain; charset=UTF-8 is a UTF-8 encoding of the text/plain media type. As you might guess, there is also text/plain; charset=UTF-32, among others.

schema:encodingFormat, formerly schema:fileFormat, lacks any strong assurance that it's not going to change again, and while its description cites IANA and MDN, it largely ignores the information found there (and the second cite itself ignores many aspects of the authoritative content of the first).

Maybe we might consider ianaMediaType and not map it to anything in AS2, AS3, schema.org, or elsewhere....

iherman commented 9 months ago

The issue was discussed in a meeting on 2024-02-14

View the transcript #### 2.6. reconsider `@id` for `mediaType` term (issue vc-data-model#1408) _See github issue [vc-data-model#1408](https://github.com/w3c/vc-data-model/issues/1408)._ **Brent Zundel:** reconsider `@id` for mediaType term. looks like this was addressed? **Ivan Herman:** my impression is that this was solved, we can close it. we are sticking here to what schema.org does. **Brent Zundel:** proposal to close with no action? **Manu Sporny:** currently there is a conflict b/w activity streams and verifiable credentials. activity pub wants to use VCs. they will be blocked/have a bad experience with things as-is. we have the easiest ability to change this. shouldn't just close. we should do something about it. … easiest for us to rename the term from mediatype to ianamediatype or similar. then a question of whether we should re-use the schema.org term for it. … or create our own. then it gets complex. **Ivan Herman:** don't want to get into the weeds of defining/creating types. we can change the term we use to map to schema.org, but would leave this to schema.org, they are already doing this. we can use whichever term we want in our vocab. **Manu Sporny:** let's rename our term to 'IANAMediaType' and keep the schema.org URL the same. … we should do this and then be done, and we will be fine. **Ivan Herman:** change to be made is in the spec text and context file, right? **Manu Sporny:** yes. **Ivan Herman:** never touched this but should be able to do it. **Dave Longley:** do we want to use encoding format as the term? since we're using it as the schema.org property? avoid inventing a new term. > *Manu Sporny:* I'm fine w/ "encodingFormat". **Ivan Herman:** perfectly fine. **Brent Zundel:** any opposition? > *Manu Sporny:* Only argument against would be: "People don't call these things "encoding format" usually? They say media type? > *Dave Longley:* they also don't say "ianaMediaType". > *Manu Sporny:* true story. **Brent Zundel:** none heard. Ivan is assigned. that is our meeting for today. … please jump on PRs that need feedback in vc-jose-cose and others. if you are assigned in the data model please get a PR in for that. > *Manu Sporny:* "iana" -> "I Am Not A" Media Type. **Brent Zundel:** getting to the point where we could mark things as 'future' ... people will need to be assigned to address them. … thank you everyone, we could not do this without you. see you next week. ---
iherman commented 9 months ago

1440 has been raised to solve the issue. It should be closed if the PR is merged.

davidlehn commented 9 months ago

I think I agree with many of the points here including:

Questions:

Important note:

My opinion:

iherman commented 9 months ago

Hi @davidlehn,

I think we are (all) in a wild agreement that none of the solutions are perfect. There are bad and worse options...

Just to reflect to your questions:

  • Does anyone know the reasoning, if any, schema.org chose encodingFormat? (I have looked beyond the commits where they changed to it.)

Personally, I have no idea. Let me refer back to @TallTed's remark in https://github.com/w3c/vc-data-model/issues/1408#issuecomment-1884992592

  • Are there any other well known URIs that are more aligned with "media type"? Is that an option when considering benefits of schema.org alignment? (as:mediaType is defined on as:Link and as:Object so that doesn't seem general enough)

Not that I know of.

  • Is working with schema.org to rename to or mint https://schema.org/mediaType a realistic option?

I do not think it is realistic. Any change on schema.org requires a lot of time because it is a longish process (which, in view of its wide usage, is understandable). I do not think any of us would want to invest into something like that.

In fact, we have several, orthogonal issues:

  1. What should the term be. It can be whatever we decide, but the general approach in JSON-LD context files to map terms directly to URL-s (as opposed to use prefixes much more widely) is bound to lead to name clashes (as you yourself say in your comment). This is the problem we have with AS. Solving this would require a major reengineering of our approach to contexts which, at this point, we cannot do.

    Funnily, the fact that encodingFormat is a strange term may be a good thing, because it reduces the chances for a name clash...

  2. What should the term be mapped on, i.e., what is the URL of the property. We could decide to map it onto our own space, i.e., define the term to be part of the DI vocabulary. In general, this is frowned upon in the Linked Data community, the emphasis is reuse when possible (hence our usage of schema.org). But, in view of the vagueness of the term and the surrounding problems, maybe we can decide to do that, and forget about schema.org
  3. If we choose to keep in our vocabulary, we have to decide what the range should be. It can be "string" datatype, with some general handwaving on what the string should express, but that is not ideal. We should then probably define a new datatype (like we did for multibase) provided there is somewhere a normative definition on what those values can be (a description that we can refer to normatively, that is).

As far as I am concerned:

iherman commented 9 months ago

The issue was discussed in a meeting on 2024-02-21

View the transcript #### 3.2. Changed the term `mediaType` to `encodingFormat` (pr vc-data-model#1440) _See github pull request [vc-data-model#1440](https://github.com/w3c/vc-data-model/pull/1440)._ **Ivan Herman:** We had a call on this issue. We decided to change the term "Media Type" to "Encoding Format". _See github issue [vc-data-model#1408](https://github.com/w3c/vc-data-model/issues/1408)._ **Ivan Herman:** Discussions are ongoing. … We shouldn't spend that much time on this. **Manu Sporny:** The goal is for activity streams to be able to use this without changes. … We may just be incompatible with the activity streams context. … If changing this one thing doesn't fix it, then we shouldn't make the change, since the problem wouldn't be addressed. … The way activity streams and schema.org define the context are neither right. … We probably don't want to do this. … Maybe we should define IANA media type and have it refer to the IANA registry. … I'm leaning towards that being my preference. … The downside is that we're creating yet another term. **Ivan Herman:** Note that the two things you mentioned are orthogonal to one another. … What term should we use? … Should we define it ourselves? > *Ted Thibodeau Jr.:* ianaMediaType was my idea, fwiw. its domain & range remain vital. **Ivan Herman:** The definition of a data type for media types can be added. … It's not a huge deal. … I question altogether whether we should do. > *Dave Longley:* another option is to go with `encodingFormat` today and then potentially add `ianaMediaType` or `mediaType` in a future WG. **Michael Jones:** It would be strange to change from a term that is well known "media type" to "encoding format", which we'd be entirely making up. **Ivan Herman:** We are not making it up, schema.org defined it. **Michael Jones:** That's not authoritative for us. **Ivan Herman:** That's debatable. **Manu Sporny:** Dmitri is on the call and is chairing the Social Web Community Group. **Dmitri Zagidulin:** We're the ones shepherding the activity streams formats. … We want to be able to sign activity streams objects. **Manu Sporny:** We shouldn't use schema.org. … We shouldn't use activity pub. … We should point to IETF and IANA and get this right once and for all. … It probably shouldn't go in our vocabulary. … It could go in our security vocabulary. … We should call it something that people understand. > *Ivan Herman:* See [IANA pointer](https://www.iana.org/assignments/media-types/media-types.xhtml). **Ivan Herman:** I have put this pointer into the minutes. … From an RDF point of view, would the pointer be the URL of the property? … I don't really like that. … Instead we can define a media type for RDF. … This is where the string format is defined. … We define a property in one of our vocabularies. **Manu Sporny:** I wouldn't object to that. … But we'd be repeating what the social web and schema.org did and we'd be creating another property. **Ivan Herman:** I don't know exactly how activity streams defined it. … If its compatible with IANA, we could use it. … If Dmitri gives me a pointer to the definition, I could look at it. > *Manu Sporny:* It's defined here: [https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype](https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype). **Dmitri Zagidulin:** What's the objection to reusing the mediaType definition? **Manu Sporny:** They made it too specific to activity streams (by defining domain restrictions). **Dmitri Zagidulin:** That could be changed so it can be applied to any domain. **Ivan Herman:** That would be perfect. **Dmitri Zagidulin:** We could change that. **Michael Jones:** It's not clear to me, are we taking a dependence on an externally defined vocabulary? **Manu Sporny:** We already point to a bunch of externally defined vocabularies. … We'd be reusing the URL they use for the definition. … This would be more correct than using the schema.org encoding. … We could actually call this media type. **David Lehn:** ?Question? … Equivalency checking. > *Manu Sporny:* agree with Dmitri, I don't think this is an issue to re-use AS as long as it's aligned. **David Lehn:** How much do people do full RDF processing? **Dmitri Zagidulin:** Zero. **Brent Zundel:** The proposal to raise an issue on the activity streams repository sounds right. … For our PR, the consensus is to not merge that PR. **Ivan Herman:** I'm happy to close it. … Who has the action to raise the PR in the right place? **Brent Zundel:** I'm willing to do it but I'm not sure I could accurately reflect what we want. **Ivan Herman:** I'm willing to do it. > *Michael Jones:* The activity streams repository is [https://github.com/w3c/activitystreams/issues/](https://github.com/w3c/activitystreams/issues/).
gobengo commented 8 months ago

If the primary reasoning to change the mediaType name is for the AS fix, and that's only a partial solution, maybe more thought is needed on the best approach here.

I've learned a lot since I originally filed this issue. I think I shouldn't have been so quick to suggest that VC vocab re-use as2 media type. That alone wouldn't really be a big improvement. In practice, I don't have an immediate need for putting as2 into a VC in a way where I can't also rewrite the as2 into some JSON-LD that does play nice with (e.g. prefixing all as2 terms with 'as2:' to remove any chance of colliision with a VC term).

I don't think this issue as I originally wrote it is the best articulation of where vc-data-model may want to go from here, but some possible good next steps for other issues are:

I'm closing this so it's clear I no longer have any issue with vc-data-model, but if anyone wants to drive a particular change forward, do make another issue and link to it from here.