Closed gobengo closed 8 months ago
Looks like this was added in this PR https://github.com/w3c/vc-data-model/pull/1296 by @iherman. wdyt @iherman ?
Just to make it clear for other reviewers in the WG:
we are talking about making a choice, in the VCDM vocabulary, between two externally defined terms for our internal use, namely:
encodingFormat
; ormediaType
Our usage of the term is defined in the VCDM spec that uses the term mediaType
, mapped (via the @context
mechanism) to the schema.org term. In other words, so far our choice was (1) above. The usage of the term within the VCDM is related to the integrity control of related resources.
Back to the original question of @gobengo: we made a decision in the past of relying on externally defined terms whenever possible, hence our usage of several schema.org terms in our data model (instead of redefining them locally). Technically, we could decide not to use a schema.org term but, instead, map the value to the activity stream version (i.e., switch to choice (2) above); that would affect the VCDM only superficially. Also, taking into account the fact that this term has been introduced into the model relatively recently, I do not think it has any backward compatibility issue with deployed applications. On the other hand, doing so would bring some inconsistency in the vocabulary; that is why, taking into account the widespread usage and importance of schema.org, I am a little bit reluctant to do the switch. But the choice is not mine, but the WG's; if the WG decides to switch, I am happy to accept.
However. Making one step backward here for a moment: I wonder whether this may not be a more general issue that we may want to look at first. The use case of @gobengo is interesting in general: combining a well established Linked Data vocabulary with the VCDM. The fact of having a clash with the term names is bound to happen again. While the case of mediaType
is relatively easy to solve (it is only a matter of a small change in the @context
file, as shown in the straw man proposal), this may not be true for other cases which directly clash with the core vocabulary. I am curious to see whether experts would give an answer to https://github.com/w3c/json-ld-syntax/issues/424 but, if there is no easy solution, I wonder whether the systematic usage of @protected
will not become a general problem in the future. Maybe we will be forced to consider some alternatives. But I would leave this discussion to those who are much more versed in the intricacies of @context
mappings, like @dlongley, @BigBlueHat, @pchampin, or @gkellogg…
@iherman,
I provided an answer at the original issue.
My view regarding "wider questions" is: the fact that some contexts that predate the usage of @protected
exist and are often used with unauthenticated data should not drive best security practices for authenticated linked data. Rather, new contexts should be created that follow best practices (e.g., use @protected
and scoped contexts) -- especially if the data is intended to be secured (e.g., using VCs) and used by "type-specific" consumers that require particular contexts and JSON structure.
Note: If we feel strongly about avoiding conflicts in particular with the mediaType
definition we provide in the VC v2 context, we could type-scope it and require a type of ExternalResource
(or similar) to bring mediaType
into scope. This, IMO, is the right way to avoid conflicts -- if we feel that this one in particular is something we should be avoiding.
Thanks, @dlongley. If your solutions work and are acceptable to @gobengo, then we can indeed close this current issue.
Two remarks, however:
[…] new contexts should be created that follow best practices (e.g., use @protected and scoped contexts) -- especially if the data is intended to be secured (e.g., using VCs) and used by "type-specific" consumers that require particular contexts and JSON structure.
We have to acknowledge that this is non-trivial for designers, and it requires a more elaborate knowledge of how @context
works. I wonder whether we will have some place where these implementation advices could be documented. I think it would be important to do so. (cc this to @brentzundel to possibly come back to this issue during CR phase. Should we open a separate issue with this?)
If we feel strongly about avoiding conflicts in particular with the mediaType definition we provide in the VC v2 context, we could type-scope it and require a type of ExternalResource (or similar) to bring mediaType into scope.
I am not sure how this would be possible. We deliberately chose to keep the domain of digestSRI
open (and mediaType
is the "adjacent" term). If we went down the line you propose, it may become difficult to properly use constructions like Example 27: I presume it would require to do something with the image
property in the @context
file. But, I presume, that is an application-specific term...
(But, again, you may have some extra @context
trick up in your sleeve…😀)
@dlongley wrote:
I provided an answer at the original issue.
Here's the link to the answer: https://github.com/w3c/json-ld-syntax/issues/424#issuecomment-1874406508
Is anyone super attached to defining mediaType in terms of https://schema.org/encodingFormat?
As Ivan noted, the use of mediaType
is "new" so changing it at this point wouldn't be a big deal, but changing it for the reasons proposed above might be a bad idea.
Could we entertain the notion of defining it by re-using as2 mediaType?
The question in the group will become: "Why is Activity Streams, which has far less usage than schema.org, defining mediaType? Why isn't Activity Streams just re-using schema.org encodingFormat
, instead?"
Another option could be that the AS3 (the next version of the context) would change/update to use schema.org.
I don't have a strong opinion one way or the other on this issue, but I am hesitant to change things w/o more implementer feedback. How many implementers are planning on using VCDM v2.0 w/ AS2. If this is an experiment at this point, with no production-track implementations doing the above, then it's going to be a hard sell to the WG.
We are attempting to go into CR this coming week with VCDM v2.0, so any kind of "hail mary" type change at this point would probably be frowned upon. I will note that we can change the context file during CR (we explicitly mention that we might do that in the spec).
So, the options open to us in "least disruptive" to "most disruptive order are:
mediaType
URL.mediaType
property in the AS2 context.mediaType
to encodingFormat
.I'm marking this as post-CR since we can "fix" this in CR and will need more time to discuss on the proper path forward here given that we're trying to transition VCDM v2 into CR this coming week.
@msporny sounds right. options 3 or 4 would be ideal. Moreso even than option 2 which is what my original post suggests only because I didn't think of options 3 or 4.
I presume that the easiest solution for all of us is option (4), unless AS2 does that change for the mediaType
property. I would be fine with it (it may reduce future clashes, because the term "media type" may be more widespread).
I am happy to come up with a PR if we have a general agreement. Do we?
I presume that the easiest solution for all of us is option (4), unless AS2 does that change for the
mediaType
property. I would be fine with it (it may reduce future clashes, because the term "media type" may be more widespread).
Unfortunately, I don't think any of the options are "easy". The problem is that both AS2 and schema.org have claimed two different ways of expressing media type and now VC2 is in the unenviable position of having to choose one or the other, or do something weird (like start using the "encodingFormat" term, which is not readily used in the industry).
In VC2, we really do mean media type w/ our property... so I don't think renaming it to "encodingFormat" is the right way to go.
I am happy to come up with a PR if we have a general agreement. Do we?
No, unfortunately, I don't think we do. I think we need to be very careful about what we do here, none of the options are as straightforward as they seem. Yes, some of them are "easy" in that they don't require us to think too hard about the repercussions, but there will be repercussions.
I think the next step is to list "Pros" and "Cons" for each approach so we can do a full analysis w/ the VCWG engaged. Let's not do something hasty here as mediaType
is fairly critical to a variety of VC2 use cases.
I presume that the easiest solution for all of us is option (4), unless AS2 does that change for the
mediaType
property. I would be fine with it (it may reduce future clashes, because the term "media type" may be more widespread).Unfortunately, I don't think any of the options are "easy". The problem is that both AS2 and schema.org have claimed two different ways of expressing media type and now VC2 is in the unenviable position of having to choose one or the other, or do something weird (like start using the "encodingFormat" term, which is not readily used in the industry).
Just for the sake of arguments: for some reason (maybe being faced with a similar dilemma), the schema.org vocabulary creators decided to use the encodingFormat
term, and industry survived by using this. After all, what our @context
file does is to map the term mediaType
to schema's encodingFormat
, i.e., your option (4) in https://github.com/w3c/vc-data-model/issues/1408#issuecomment-1879778435 is simply to say that we fully embrace the usage pattern of schema.org instead of hiding it (which is what we do now).
I do agree that this is a VCWG's choice, though. A PR can come before doing that (to facilitate the decision) or after it.
schema.org has redefined a great many things, or at least ignored common use and understanding, without much if any apparent concern for any repercussions. Something about being backed by multiple 800lb Gorilla organizations seems to encourage this.
Several IANA media types may be encoded in multiple ways. For instance, text/plain; charset=UTF-8
is a UTF-8 encoding of the text/plain
media type. As you might guess, there is also text/plain; charset=UTF-32
, among others.
schema:encodingFormat
, formerly schema:fileFormat
, lacks any strong assurance that it's not going to change again, and while its description cites IANA and MDN, it largely ignores the information found there (and the second cite itself ignores many aspects of the authoritative content of the first).
Maybe we might consider ianaMediaType
and not map it to anything in AS2, AS3, schema.org, or elsewhere....
The issue was discussed in a meeting on 2024-02-14
I think I agree with many of the points here including:
encodingFormat
does not seem like the best name for a media type property.encodingFormat
and mapping to the schema.org property.Questions:
encodingFormat
? (I have looked beyond the commits where they changed to it.)as:mediaType
is defined on as:Link
and as:Object
so that doesn't seem general enough)https://schema.org/mediaType
a realistic option?Important note:
mediaType
to encodingFormat
in the VC context will not alone solve the ActivityStreams issue. That context also defines name
as as:name
, which conflicts with the VC definition as schema:name
. Changing that is a more difficult problem.name
errors. I think those are the only two conflicts.My opinion:
encodingFormat
. It's feels like it's perpetuating what is not the best naming choice at schema.org.mediaType
name is for the AS fix, and that's only a partial solution, maybe more thought is needed on the best approach here.Hi @davidlehn,
I think we are (all) in a wild agreement that none of the solutions are perfect. There are bad and worse options...
Just to reflect to your questions:
- Does anyone know the reasoning, if any, schema.org chose
encodingFormat
? (I have looked beyond the commits where they changed to it.)
Personally, I have no idea. Let me refer back to @TallTed's remark in https://github.com/w3c/vc-data-model/issues/1408#issuecomment-1884992592
- Are there any other well known URIs that are more aligned with "media type"? Is that an option when considering benefits of schema.org alignment? (
as:mediaType
is defined onas:Link
andas:Object
so that doesn't seem general enough)
Not that I know of.
- Is working with schema.org to rename to or mint
https://schema.org/mediaType
a realistic option?
I do not think it is realistic. Any change on schema.org requires a lot of time because it is a longish process (which, in view of its wide usage, is understandable). I do not think any of us would want to invest into something like that.
In fact, we have several, orthogonal issues:
What should the term be. It can be whatever we decide, but the general approach in JSON-LD context files to map terms directly to URL-s (as opposed to use prefixes much more widely) is bound to lead to name clashes (as you yourself say in your comment). This is the problem we have with AS. Solving this would require a major reengineering of our approach to contexts which, at this point, we cannot do.
Funnily, the fact that encodingFormat
is a strange term may be a good thing, because it reduces the chances for a name clash...
multibase
) provided there is somewhere a normative definition on what those values can be (a description that we can refer to normatively, that is). As far as I am concerned:
If we can do (3) properly, then our cleanest option may be to define things of our own. This still leaves issue (1) to be solved.
Maybe keeping the encodingFormat
as a term name because, cynically, it is so bad that there is a low probability of a name clash 😀 (except with schema.org...)
encodingFormat
and mapping to the schema.org property" and move on. It may not be worth to spend too much time an energy on this…The issue was discussed in a meeting on 2024-02-21
If the primary reasoning to change the mediaType name is for the AS fix, and that's only a partial solution, maybe more thought is needed on the best approach here.
I've learned a lot since I originally filed this issue. I think I shouldn't have been so quick to suggest that VC vocab re-use as2 media type. That alone wouldn't really be a big improvement. In practice, I don't have an immediate need for putting as2 into a VC in a way where I can't also rewrite the as2 into some JSON-LD that does play nice with (e.g. prefixing all as2 terms with 'as2:' to remove any chance of colliision with a VC term).
I don't think this issue as I originally wrote it is the best articulation of where vc-data-model may want to go from here, but some possible good next steps for other issues are:
mediaType
with @protected
and just plan on people with this use case of defining several vocabs needing to use some advanced JSON-LD features to workaround it. If you want to make that easier, maybe an appendix in some doc could explain how to do it with this mediaType
example or some other contrived example.I'm closing this so it's clear I no longer have any issue with vc-data-model, but if anyone wants to drive a particular change forward, do make another issue and link to it from here.
Context:
credentialSubject
of a VC.mediaType
term conflicts with the@protected
vcdm2mediaType
termas:mediaType
.Questions
mediaType
in terms ofhttps://schema.org/encodingFormat
?Straw Man