w3c-ccg / traceability-vocab

A traceability vocabulary for describing relevant Verifiable Credentials and their contents.
https://w3id.org/traceability
Other
34 stars 35 forks source link

use CURIEs in $linkedData #285

Closed VladimirAlexiev closed 1 year ago

VladimirAlexiev commented 2 years ago

Mistakes like #268 will be avoided if you define prefixes in the JSONLD context:

{
  "uncl1153":"https://service.unece.org/trade/uncefact/vocabulary/uncl1153#"
}

As you can see here: https://service.unece.org/trade/uncefact/vocabulary/uncl1153.jsonld.

And then use CURIEs in $linkedData instead of full URLs, eg

  "@id": uncl1153:Consignment_identifier_carrier_assigned

Is that possible in the JSON Schema that you use?

VladimirAlexiev commented 2 years ago

Googling "$linkedData" JSON schema suggests that is your invention.

@OR13 @nissimsan Who should I ping about this?

OR13 commented 2 years ago

@VladimirAlexiev indeed, we built a tiny library that generates a JSON-LD context file from a directory of JSON Schemas:

@transmute/jsonld-schema

This allows us to "look at the types" through 2 different lenses.

https://w3id.org/traceability/openapi/

https://w3id.org/traceability/v1

VladimirAlexiev commented 2 years ago

@nissimsan @OR13 can you use CURIEs (shortened URLs) in the schemas and generated context?

OR13 commented 2 years ago

I'm a -1 to using them, I have found them to cause problems just like using a global namespace causes problems, I suggest we close this issue as "wont' do"... possibly after documenting the decision.

OR13 commented 2 years ago

@nissimsan @BenjaminMoe please comment.

VladimirAlexiev commented 2 years ago

@OR13 Not sure what you mean. Do you mean your schemas could grow so large and diverse that two different people would want to use prefix uncl1153: with two different meanings??

OR13 commented 2 years ago

https://github.com/w3c-ccg/security-vocab/issues/57

my comment was based on this and other discussions I have had on the security vocab... I can see both sides I suppose.

VladimirAlexiev commented 2 years ago

@OR13 on May 10:

https://www.w3.org/TR/cooluris/ Let's make a concrete proposal, or close this issue.

"Cool URIs don't change", we all know that principle (though in practice it's violated too often).

But for URIs to be usable, they need to be shortened.

268 proves that people are not very good at handling long URIs, even experienced data engineers like in this group.

https://github.com/w3c-ccg/security-vocab/issues/57: @dlongley

prefixes generally cause trouble. They can be "ok" if appropriately scoped and well thought out, but that's asking too much of authors most of the time

268 proves the disadvantage of not using CURIEs. I think that using long URIs is asking too much of authors.

JSON-LD does not have a mechanism for making prefixes only meaningful inside @context

Do you mean that someone might accidentally use "uncl1153:Consignment_identifier_carrier_assigned" as the value of an object property, and it would be unexpectedly converted to https://service.unece.org/trade/uncefact/vocabulary/uncl1153#Consignment_identifier_carrier_assigned ? Sorry, but I doubt that very much.

they can spill out into the data, which can lead to undesirable or unexpected behavior.

Context definitions are designed to "spill out" into the data.

In 10 years of ontology work, I've only seen one case of unintended conflation:

Let's not throw out the baby with the bathing water, ok? Let's evaluate the benefits against the dangers in a realistic way.

dlongley commented 2 years ago

@VladimirAlexiev,

Do you mean that someone might accidentally use "uncl1153:Consignment_identifier_carrier_assigned" as the value of an object property, and it would be unexpectedly converted to https://service.unece.org/trade/uncefact/vocabulary/uncl1153#Consignment_identifier_carrier_assigned ? Sorry, but I doubt that very much.

No, it is much more likely that they will use "uncl1153:Consignment_identifier_carrier_assigned" as the value of an object property and it will be unexpectedly (not) converted to "uncl1153:Consignment_identifier_carrier_assigned", i.e., the prefix uncl1153 should have been defined but was not. This is even more likely with popular vocabulary reuse (e.g., schema:name from schema.org).

Context definitions are designed to "spill out" into the data.

Yes, I believe this to be a mistake. It should have been possible to isolate them. Even then, people will make mistakes when using type-scoped contexts -- as this mistake was made in a W3C REC (https://github.com/w3c/vc-data-model/issues/778).

OR13 commented 2 years ago

At least in the context of this work item, we are not using CURIEs... we discussed this on several calls.

We are not hand crafting the JSON-LD context, its built from JSON Schema... the context builder does not support all the features of JSON-LD and we think thats a good thing.

That being said, if you have another context where you want to use CURIE's I think there are cases where maybe thats ok... if the group knows what they are doing, and agrees thats a thing they want to do... and they know what they are doing.

VladimirAlexiev commented 2 years ago

@dlongley Your reasoning is a akin to this:

@OR13 You use no prefixes at all in instance data. That's a valid strategy because it makes it easier for people. We did the same in EPCIS: optimized as much, for people to be able to use plain values. However:

My (heated) arguments above are for the benefit of end users.

dlongley commented 2 years ago

@VladimirAlexiev,

@dlongley Your reasoning is a akin to this:

  • Condoms are a good thing because they are a cheap way to do planned parenthood
  • No, they are a bad thing because people may forget to put them on. They should be outlawed

Other ways of looking at this:

Of course people could still use A, but it's more complex so it's harder to get right, so we shouldn't recommend it. Now, maybe you say that's not quite fair here because perhaps the "same use cases" clause doesn't apply. Here's a case where the same use cases aren't solved either, but for good reason:

When you're creating a standard, you need to be able to defend certain choices in light of the trade offs. Maybe there's a combination that's safe that you can't make with B that you can make with A. But going with B is better because there are also many unsafe combinations that you can make with A. Method A is often called a "foot gun".

You could also look at this like programming language trade offs -- do we want a "language" that is like Rust or one that is like C? The target constituencies (and security outcomes) may be different.

I don't think the pros for CURIEs outweigh the cons. I don't think the implementation complexity, security problems, or user confusion they create are worth it. I think there are either other ways to solve what they are solving that are of greater benefit to users or that the use case to be solved doesn't reach the threshold required to overcome the implementation concerns. But, at the end of the day, I'm just one person and if I can't convince the community of my position, then we'll wind up going in another direction.

Some thoughts on your responses to @OR13:

this strategy won't scale to infinity: at one point you'll have too many ontologies and too many terms, and will need to use namespaces.

Using namespaces isn't the issue here (at least, I didn't think it was). CURIEs are. You can still use namespaces without CURIEs. If you meant "CURIEs" here instead of "namespaces", why do you think you'll necessarily have too many ontologies and too many terms at once -- such that -- you will need to use CURIEs? Why would CURIEs be the only solution?

It makes it harder to reuse ontologies in bigger ways (nearly "wholesale"): you always need to worry about term conflicts and come up with conflict-free aliases. Currently you reuse ontologies in a "pick and choose" fashion, and reuse them freely (without regard to domain/range), but that may change

You can either worry about CURIEs or worry about term conflicts. You may find that worrying about term conflicts actually produces better looking data -- and I'd expect this to be the case for JSON developers. I believe the priority of constituencies should also place consumers of data before authors. If you more strongly type your data and use type-scoped contexts to determine term definitions you'll make the developers that have to work with your data happier, IMO. I think this approach is more idiomatic for JSON developers than CURIEs -- and more natural for many other developers as well. I don't think CURIEs are the best solution to this problem for JSON-LD.

It also makes it harder for people to use ontologies that they already know, because they need to find your alias for the ontology term they know

This may be true. If ontology authors publish type-scoped contexts, however, I would expect it to reduce the concerns without having to involve CURIEs. Even if ontology authors don't do this, anyone else can craft their own contexts that do -- and perhaps share those with the community.

My (heated) arguments above are for the benefit of end users.

Mine too. (Though for my arguments sake ... I hope they don't come across as "heated"!)

BenjaminMoe commented 1 year ago

@OR13 @dlongley @VladimirAlexiev is there an action item on this issue?

OR13 commented 1 year ago

I don't think so, we have decided not to curies... I suggest we close this issue.

nissimsan commented 1 year ago

Decision is to not do this. Closing.