json-ld / yaml-ld

CG specification for YAML-LD and UCR
https://json-ld.github.io/yaml-ld/spec
Other
22 stars 8 forks source link

Replace $-keywords with @-keywords #11

Closed anatoly-scherbakov closed 2 years ago

anatoly-scherbakov commented 2 years ago

As an author of YAML-LD files … WHO I want an ability to type keywords without quotes … WHAT So that my authoring experience is better … WHY

Motivation

I believe the primary purpose of having a Linked Data format based on YAML is to simplify manual authoring of the linked data documents. This means that, in an information system, we could ask domain experts to write YAML documents to describe their knowledge.

YAML is much easier to write manually than JSON because it does not require as much syntactic noise. Normally, keys can be written without quoting at all:

date: 2022-05-29
title: This is my latest blog article

However, sooner or later the document author will have to define @type, @context, @language, or any other JSON-LD keyword; and then they have to remember that @ is a reserved character and that in such cases quoting is mandatory.

The potential author of YAML-LD documents is not necessarily a programmer; they might be a history student, an anthropologist, a biologist, a physicist.

Let's not make their life harder than it has to be.

Potential risks

Possible implementations

Proposal

Let us replace $@ and vice versa only for the particular keywords. For instance,

$schema: boo
$context: foo

will be converted into

{
  "$schema": "boo",
  "@context": "foo"
}

because @context is a JSON-LD keyword and @schema is not.

Thus, we will minimize the possibilities for conflict while still getting rid of the nasty quotes.

gkellogg commented 2 years ago

I think we need to analyze the potential for a YAML-LD file including some well-known context which defines $ equivalent for @ keywords. This would work with the existing algorithms and allows for all keywords other than @context to use a $ form. It also round-trips through compaction.

We could create a standard context at, say https://www.w3.org/ns/json-ld/yaml-ld.jsonld containing these keyword definitions which a YAML-LD source file could reference. This doesn't work for the expanded form, but I don't see that as an issue.

Note that many contexts already provide bare-word equivalents for keywords, such as id => @id, type => @type, so including such a context might not be necessary.

pchampin commented 2 years ago

We could create a standard context at, say https://www.w3.org/ns/json-ld/yaml-ld.jsonld containing these keyword definitions which a YAML-LD source file could reference. This doesn't work for the expanded form, but I don't see that as an issue.

Yes, I thought of that. It would work... except for @context, which can not be aliased (https://www.w3.org/TR/json-ld11/#aliasing-keywords).

anatoly-scherbakov commented 2 years ago

Yes indeed, this is quite an interesting idea.

BigBlueHat commented 2 years ago

This is a lot of language change just to avoid some quotations marks. It's come up before in the JavaScript community (and others) which frequently get annoyed by having to write doc['@context'] in JS vs. the "ideal" doc.@context (which isn't valid, of course).

I'd think that @gkellogg's approach of providing an alias mapping, would provide a good work around for 99% of the keywords, and @context's uniqueness would continue to (as it has now for years) provide an in-document "signal" that one is dealing with JSON-LD (just as $schema does for JSON Schema).

That seems the simplest approach with the fewest possible confusions and/or conflicts with other communities who may use $ as a prefix.

juusoautiosalo commented 2 years ago

I think this "convenience context" as a best practice is a great idea! I have experienced the pain of writing quotations in YAML (and JavaScript) myself and this still enables staying compatible enough.

I would like to raise a question though: Is the $ character the best choise? It may very well be, but I think it would make sense to look at possible options for a while as it might potentially save hassle in the future. E.g. it might make sense to be sure to support as many language quirks as possible, similar to avoiding the doc['@context'] vs. doc.@context issue in JavaScript mentioned by @BigBlueHat.

Do you think alternative characters is a topic worth discussing or is $ a clear choice?

Here is a list of special ASCII characters, some of which are already reserved for special purposes: ! " # $ % & ' * + , - . / : ; < = > ? @ \ ^ _ | ~

ioggstream commented 2 years ago

My experience in interoperability suggest to be very cautious with clever solutions that can be easily replaced by IDE's features.

Anyway, if you are interested in following that path, @juusoautiosalo 's question is correct: pick another character instead of "$" to avoid overlaps with JSON Schema.

anatoly-scherbakov commented 2 years ago

My current opinion is this:

This will make the JSON-LD → YAML-LD → JSON-LD round-trip workable and reduce the complexity of the whole system because we delegate it to existing JSON-LD mechanism.

ioggstream commented 2 years ago
  1. ok that using a specific context is the way to go if an author wants to do it
  2. proponents should identify a different char for that: and £ are valid alternative to $ to me :P
  3. I won't recommend this practice since it paves the way to security issues, but environments are different and I can just talk for mine. Surely $ is to be avoided for clashes with other specs.

My 2¢

anatoly-scherbakov commented 2 years ago
  1. proponents should identify a different char for that: and £ are valid alternative to $ to me :P

$ is a part of the most standard ASCII layout. This is not a very good argument since I had mentioned myself that the user might use any mapping they want; perhaps they want to map @id to 🧸 in their context, why not? Looks cute.

But still — perhaps would be a good thing in the standard Convenience Context to stick to ASCII.

  1. I won't recommend this practice since it paves the way to security issues, but environments are different and I can just talk for mine. Surely $ is to be avoided for clashes with other specs.

Could you illustrate with an example use case where $ can be harmful? This could greatly contribute to the discussion.

I provided some reasoning behind character choice at https://github.com/json-ld/yaml-ld/issues/55

ioggstream commented 2 years ago

@anatoly-scherbakov

TL;DR

The most interoperable way to avoid quoting "@" is to engage with YAML community until "@" is un-reserved. Another solution is to implement a feature in IDE YAML plugins that automatically adds quotes when writing "@".

security

JSON-LD can be processed after the conversion to RDF: this means that the context is capable of modifying how a message is processed https://github.com/web-payments/web-payments.org/issues/21

For example, a malicious agent could exploit some leaky checks and be able to replace @id thus creating confusion in an entry. IMHO this approach is then not advised when you want to enforce syntactic validation (e.g. via WAF or similar tool). In a closed ecosystem, people that does not have security constraints and does not need to interoperate with external entities may still like it.

{
    "@context": {
        "£id": "@id"
    },
    "@id": "http://example.org/foo",
    "£id": "http://example.org/foo", <- a legitimate JSON that breaks when adding the mapping context
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
}

One great benefit of YAML is that it does not allow duplicate keys (which is a long standing (security) issue for JSON). This approach re-introduces duplicate keys under a different form: an organization should probably make a security assessment before making such a decision.

interoperability with others

Let's call $-LD-files the ones using $-keywords. When you need to interoperate in the web you might have issues with:

Those actors' files will not be JSON-interoperable with $LD files and will need to normalize them before using.

interoperability with JSON-LD

The $-context (or £-context) is defined to be specific to YAML-LD, but will spill-over to JSON-LD because of all implementations that use "dumb" JSON/YAML libraries to process these files.

TallTed commented 2 years ago

@ioggstream -- Please edit your https://github.com/json-ld/yaml-ld/issues/11#issuecomment-1206308330 and put a code fence around @id (whether or not you then retain the double-quote wrapper), so that GitHub user doesn't continue to get pings on every update to this issue, in which they are probably not interested.

gkellogg commented 2 years ago
  • we can provide a default context that maps @ to $ or some other character;

Why not just map to plain words (i.e., @id => id). That is the practice followed by many other contexts. Maybe there should be a couple of suggested contexts for authors to use.

The most interoperable way to avoid quoting "@" is to engage with YAML community until "@" is un-reserved. Another solution is to implement a feature in IDE YAML plugins that automatically adds quotes when writing "@".

Although I think it would be great for the YAML community to consider this, it will be a while before it is widely deployed, so I don't think we can rely on any change to YAML to deal with our @ keywords.

JSON-LD can be processed after the conversion to RDF: this means that the context is capable of modifying how a message is processed web-payments/web-payments.org#21

JSON-LD has the concept of Protected Term Definitions and restrictions on how scoped contexts can be applied specifically to deal with this issue. Alongside Proofs/Signatures, these help maintain the integrity of JSON-LD documents. YAML-LD benefits from this practice.

  1. ok that using a specific context is the way to go if an author wants to do it
  2. proponents should identify a different char for that: and £ are valid alternative to $ to me :P
  3. I won't recommend this practice since it paves the way to security issues, but environments are different and I can just talk for mine. Surely $ is to be avoided for clashes with other specs.

I'm generally opposed to introducing $ keyword alternatives, other than be defining them within a context document. I think it's getting out too far in front of a problem that may not really exist and introduces general complication to the processing model (and interoperability) that is not worth the change.

gkellogg commented 2 years ago

@ioggstream -- Please edit your #11 (comment) and put a code fence around @id (whether or not you then retain the double-quote wrapper), so that GitHub user doesn't continue to get pings on every update to this issue, in which they are probably not interested.

I took care of it again. Everyone should try to remember this when adding comments. (Note, I found that COMMAND-e when highlighting some texts does this easily on the Mac).

anatoly-scherbakov commented 2 years ago

$ is an ASCII character, british pound and euro characters are not.

If not to restrict ourselves with ASCII then why 🔸id is worse than €id and £id? I do not believe it is.

I do not insist on including the convenience context into the specification: I will be able to use it anyway :)

TallTed commented 2 years ago

@anatoly-scherbakov --

I know how to type $ (shift-4 on any US or ASCII keyboard).

I can quickly locate £ (opt-3), (shift-opt-2), ¢ (shift-opt-4), and various others on my Mac, and those key-chords would become rote quickly enough.

Your suggested "small orange diamond" 🔸 does not appear to be available through a simple key-chord, and while GitHub finds it via :small_orange_diamond: it requires a minimum of :small_o to differentiate it from all other available Unicode/emoji characters. I doubt it's always even as easily accessed as that.

To my mind, that makes 🔸 substantially worse (read: less convenient) than or £.

ioggstream commented 2 years ago

Why not just map to plain words (i.e., @id => id). That is the practice followed by many other contexts. Maybe there should be a couple of suggested contexts for authors to use.

Having a context specific for YAML file impacts on interoperability. The fact that schema.org already maps id -> @id means that the $-context will somewhat collide with schema.org context...

introducing $ keyword alternatives, ..introduces general complication to the processing model (and interoperability) that is not worth the change

:+1:

JSON-LD has the concept of Protected Term Definitions and restrictions on how scoped contexts can be applied specifically to deal with this issue

Agree, but I'd happily avoid testing buggy implementations :)

TallTed commented 2 years ago

@ioggstream -- Please note that GitHub doesn't always preserve code fences in quoted lines, and you always need to review copied-and-pasted text ... so (i.e., @id => id) in the first line of the first quote in your latest comment, https://github.com/json-ld/yaml-ld/issues/11#issuecomment-1206863165, includes no code fence, and that user is getting pinged again.

ioggstream commented 2 years ago

@TallTed done, thanks. It seemed such a good idea to register the @id nick :P

gkellogg commented 2 years ago

Discussed in the TPAC F2F and resolved to close as won't fix.