json-ld / yaml-ld

CG specification for YAML-LD and UCR
https://json-ld.github.io/yaml-ld/spec
Other
21 stars 8 forks source link

Work on YAML-LD #3

Closed gkellogg closed 2 years ago

gkellogg commented 2 years ago

On the w3c/json-ld-syntax w3c/json-ld-syntax#389 proposes advancing work on YAML-LD. I was not able to transfer the issue to this repository, but further discussion and votes in support of starting the initiative in the JSON-LD CG should be voiced here. Given sufficient support, we'll create a yaml-ld repository for work to procede.

gkellogg commented 2 years ago

With support, I'll set up a new repo with a template ReSpec document and pr-preview.

I support starting such an initiative.

pchampin commented 2 years ago

Quoting @anatoly-scherbakov from https://github.com/w3c/json-ld-syntax/issues/389#issuecomment-1127070677

I would propose the following grounds for the @$ replacement: while JSON is machine readable and writable, it is not very human readable and — especially — writable. YAML is much friendlier in that regard, due to much lower syntactic noise. The replacement of these characters helps to further reduce the said syntactic noise, making the data files therefore faster to type.

In general, I'd name manually writable semantic data the main purpose for YAML-LD.

I will be happy to participate in the standardization process if one is to be initiated, and to assist however I can.

I see your point about easing the manual editing of YAML-LD. It is a valid point.

But on the other hand, the principle of least surprise is important. Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

What I guess I could leave with is for "native YAML-LD" processors to accept $-keyworrds in addition to @-keywords, with a flag (named e.g. strict-keywords) to disable this behaviour (in case someone wants to use $-prefixed names for another purpose.

gkellogg commented 2 years ago

What I guess I could leave with is for "native YAML-LD" processors to accept $-keyworrds in addition to @-keywords, with a flag (named e.g. strict-keywords) to disable this behaviour (in case someone wants to use $-prefixed names for another purpose.

Yes, and I think a similar flag for using $ instead of @ when serializing. Trying to preserve the original form using $ or @ is probably too complicated.

Note, that using the $ "namespace" will overlap with other existing uses, at least from JSON. For example, JSON Schema has $schema, $vocabulary and other other keywords that would not otherwise overlap with the JSON-LD keyword namespace.

tetron commented 2 years ago

A formal YAML-LD variant that is a minimum-surprise syntax variant of JSON-LD is a good idea, and I don't want to get in the way of that discussion.

However, I just wanted to link my comment from the other issue:

https://github.com/w3c/json-ld-syntax/issues/389#issuecomment-1127716287

JSON-LD has some limitations when working with certain idiomatic JSON structures. Schema Salad is a schema language for YAML and JSON documents which describes validation and transformation from idiomatic YAML structures to JSON-LD structures and then on to RDF. We have used it extensively for our own use case (describing the Common Workflow Language but I wanted to mention it here to see if there's interest in generalizing it, separately from this YAML-LD discussion.

anatoly-scherbakov commented 2 years ago

@pchampin @gkellogg I agree with using @-keywords as a default, unless a flag is supplied to the processor to enable $-keywords. The YAML-LD preprocessor which converts $ to @ might also take care to convert only the keywords which are reserved by JSON-LD. This will also resolve the possible conflict with JSON Schema (I guess the two standards have different sets of keywords).

@tetron Thank you for the information, will have a look to compare Schema Salad with plain YAML-LD that I am currently using.

VladimirAlexiev commented 2 years ago

@tetron There is no dispute that JSON-LD is no schema language and is often complemented with JSON Schema.

JSON-LD has some limitations when working with certain idiomatic JSON structures.

But can you elaborate on this point using examples? I think that JSON-LD 1.1 is very flexible, eg local term definitions, add/ remove auxiliary keys that have no reflection in RDF, etc

VladimirAlexiev commented 2 years ago

@OR13 @nicholascar please vote for YAML-LD CG workgroup above

OR13 commented 2 years ago

I'm not sure I have the cycles to help much with YAML-LD, but I think its a good idea.... we use OAS / YAML with JSON-LD and JSON Schema often.

In particular, I like the idea of controlling both semantics and data shape at the same time, using only 1 file.

tetron commented 2 years ago

But can you elaborate on this point using examples? I think that JSON-LD 1.1 is very flexible, eg local term definitions, add/ remove auxiliary keys that have no reflection in RDF, etc

So, schema salad was created in response to JSON-LD 1.0 (and roughly 5 years before JSON-LD 1.1), the main motivations were:

Examples of the last two:

steps:
  # this is an identifier map, the key is the id
  step1:
    in:
      # another identifier map,
      # the identifier is syntactically scoped so it gets appended to the 
      # enclosing id
      # the value is assigned to a default predicate of "source" since it's a scalar
      input_parameter: source_parameter_uri
    out: [output_parameter]

This is translated to json-ld 1.0 that looks something like this:

{
"steps": [{
   "@id": "step1",
    "in": {
      "@id": "step1/input_parameter",
      "source": "source_parameter_uri"
    },
   "out": [{
      "@id": "step1/output_parameter"
    }]
}]
}

We've also implemented code generators for Python and Java. It would probably also be straightforward to write a translator to express the schema as SHACL.

VladimirAlexiev commented 2 years ago

I'll start a "YAML-LD UCRs" issue and include "polyglot modeling"

nichtich commented 2 years ago

Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

I fully agree. To avoid confusion I'd vote to

  1. either limit the specification of YAML-LD to half a page, essentially stating that YAML-LD is JSON-LD expressed in YAML, so any standard conversion YAML <-> JSON (without any exceptions such as $ / @ keys) can do

  2. or to make clear that YAML-LD is not "JSON-LD in YAML" but an independent data format, loosely based on JSON-LD. Even if it's 99% like JSON-LD, you cannot use standard tools but need a specific YAML-LD aware software library to avoid running into edge cases.

Option 1 can be done as part of JSON-LD, option 2 is an independent project with unknown outcome and adoption.

anatoly-scherbakov commented 2 years ago

In my opinion, writeability is the primary reason to even have YAML-LD, and it is also my feeling that @$ replacement had greatly improved my own writing experience when authoring YAML-LD data files and contexts.

I believe the -LD suffix can be generally used to describe different data formats somehow augmented with Linked Data. For instance,

This is not an argument but I am also using Markdown with YAML-LD front matter meta data, and call it Markdown-LD :)

Thus, if to choose from the options @nichtich has proposed, I would vote for (2). I still believe the specification can be half a page to describe the exact logic of the conversion from YAML-LD to JSON-LD and vice versa, but I believe that the potential space of -LD data formats is vast. Each of them should be equipped with its own tools, even though the meaning of those formats can probably be in most cases derived from JSON-LD 1.1 specification, and we could use JSON-LD contexts to interpret the data files.

I have written up a paper about YAML-LD recently and I would be happy to get feedback from the community, but the conference I submitted the paper to requires double blind review so I am uncertain whether I am at liberty to share the draft. Guess I would have to wait till the organizers' decision about whether they're going to publish it or not.

pchampin commented 2 years ago

@nichtich I beleive that your option 2 above would be far too confusing for many people. I would rather make YAML-LD a superset of JSON-LD in YAML, i.e. adding some specific idioms / patterns that "native" YAML-LD processors would understand (e.g. the $-keywords replacement). But basically those specific idioms would be pre-processed for producing valid JSON-LD, which could then be fully compliant.

And I would make it easy to recognize/require YAML-LD documents that are strictly JSON-LD in YAML (i.e. do not require the pre-processing), like a media-type parameter (e.g. text/ld+yaml;profile=strict).

VladimirAlexiev commented 2 years ago

Folks, please contribute requirements in https://github.com/json-ld/yaml-ld/issues/2

VladimirAlexiev commented 2 years ago

@pchampin @nichtich Sorry, I find your "will be confusing" argument unconvincing. Let me play devil's advocate and apply your argument to other situations:

As a data architect, I want to be able to write shortcut notation like this, and get proper RDF (see https://github.com/json-ld/yaml-ld/issues/2 Shortcuts), and even the other way around. This below is shorthand turtle, but I'd like a similar spirit in YAML.

:Person a rdf:Class;
:born a rdfs:Property; <- :Person; -> xsd:date.
Doc1234 :creator ~tobyink .
~tobyink a :Person; :born 1980-01-01 .
`Example-Distribution 0.001 cpan:TOBYINK` issued 2012-06-18 .
OR13 commented 2 years ago

I left an example of our use of OAS (Open API Specification) and JSON-LD here: https://github.com/json-ld/yaml-ld/issues/2

TLDR; OAS supports JSON Schema represented in YAML, we tweaked the JSON Schema to support JSON-LD terms, so now we can present RDF types and JSON Schema types in a single YAML file.

gkellogg commented 2 years ago

Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

I fully agree. To avoid confusion I'd vote to

  1. either limit the specification of YAML-LD to half a page, essentially stating that YAML-LD is JSON-LD expressed in YAML, so any standard conversion YAML <-> JSON (without any exceptions such as $ / @ keys) can do
  2. or to make clear that YAML-LD is not "JSON-LD in YAML" but an independent data format, loosely based on JSON-LD. Even if it's 99% like JSON-LD, you cannot use standard tools but need a specific YAML-LD aware software library to avoid running into edge cases.

Option 1 can be done as part of JSON-LD, option 2 is an independent project with unknown outcome and adoption.

JSON-LD 1.1 explicitly chose to use an intermediate representation to allow for other formats to map easily into that representation. While your option 1 can't realistically be done in 1/2 page, IMO any spec should confine itself to this level over interoperability. (This is consistent with allowing @ to be optionally replaced with $, I believe).

The challenges of getting JSON-LD standardized in the first place should be a cautionary for trying to do someone similar but different, so I don't see option 2 as viable.

cmungall commented 2 years ago

I was asked by @VladimirAlexiev to vote on this, so I did, but to be clear I am voting that there should be some official position on how to represent JSON-LD in YAML.

I am equally convinced by the two schools here:

  1. Avoid surprises and use identical keywords
  2. optimize for writability and use more yaml-friendly characters

I am personally biased towards 1 as I rarely author contexts directly, instead I autogenerate these from a yaml-native "polyglot" language (LinkML), but I am sympathetic to those who do author the contexts directly.

pchampin commented 2 years ago

@VladimirAlexiev regarding your examples: Turtle (respectively N3) is a strict superset of N-Triples (respectively Turtle) so I don't see this as confusing. Shacl-C is a completely different syntax from Shacl in Turtle, so I don't see this a confusing. I agree that the co-existence of JSON-LD and RDF/JSON might be confusing, and notice that the RDF WG decided at the time to recommend only one of them (RDF/JSON is only a note).

@nichtich's proposition 2 is "an independent data format, loosely based on JSON-LD", where "loosely" could mean up to "99%". The closest this YAML-LD is to JSON-LD, the more surprising it will be for users when they hit a difference. Furthermore "you cannot use standard tools but need a specific YAML-LD aware softare" sounds a lot like reinventing the wheel.

I guess what I really don't want is to define something from scratch. If YAML-LD is to be based (even loosely) on JSON-LD, then I would strongly advise that YAML-LD processors are based on a JSON-LD processor under the hood. Experience shows that developing a bug-free JSON-LD processor is tricky. Requiring a similar, but different, effort for YAML-LD, seems like a bad idea.

nichtich commented 2 years ago

If YAML-LD is going to be more than a simple application of YAML syntax to express JSON-LD documents (proposition 1), it may help to reduce references to JSON-LD to avoid mixing levels of description (YAML, JSON-LD, RDF...). The specification could be limited to rules how to transform a YAML-LD document into a JSON document (e.g. replace defined use of $ with @, expand YAML tags...) without detailed knowledge of JSON-LD and RDF.

For instance the specification could explain how to transform these YAML documents

$id: "http://example.org"
---
$id: 1

into JSON documents {"@id": "http://example.org"} and {"@id": 1} respectively. The latter is not valid JSON-LD but this would be irrelevant to transformation rules from YAML-LD to JSON.

The final sentence of the specification would be requirement that JSON encoded by YAML-LD transformation rules must be valid JSON-LD. This can (and should) be checked with existing JSON-LD parsers.

anatoly-scherbakov commented 2 years ago

@nichtich I'd agree. I tried to formalize this as follows.

A document D is designated as a valid YAML-LD document if, and only if:

pchampin commented 2 years ago

Idea: instead of $-keywords, why not defining a YAML tag for each JSON-LD keyword (i.e. !context, !type, ...). These tags would only expect an empty string, so they should be use "on their own", e.g.

!contex t:
  !vocab : http:/:example.com/ns/
!id : #test
!type : Foo
bar: baz

PROS: while $context and other $-keywords could (in theory) be intended as a regular JSON keys, tags have no direct JSON interpretation, so tag-keywords are unambiguous CONS: from a short experiment, it seems you need a space between the tag and the colon (:) (while you don't need it with "regular" keys), which is error-prone...

gkellogg commented 2 years ago

A couple of thoughts on how to specify YAML-LD as an extension of JSON-LD API:

gkellogg commented 2 years ago

There is certainly enough interest to start an activity for YAML-LD. I'll create and setup a repo for this purpose. I think both the spec and the URC documents can be in the same repo, but PR Preview will only use one of them for nicely formatted PRs.

I'll put out a proposal for a CG call (maybe @pchampin can help with a Zoom setup), which would be useful for more than just YAML-LD.

gkellogg commented 2 years ago

Repo has be set up at https://github.com/json-ld/yaml-ld. If you would like to contribute, and are a member of the JSON-LD Community Group I can add you to the contributors team. Please create an issue (or respond to an already existing issue) to be added to the team.

gkellogg commented 2 years ago

Moving this issue to the yaml-ld repo.

anatoly-scherbakov commented 2 years ago

@pchampin this is an interesting idea but I'd say that the required space character is a great irregularity introduced to the syntax, and is a potential source of mistakes.

It might be interesting to use YAML tags for something in YAML-LD context, but I have no idea how at present. $type seems to work fine to assign RDF types to nodes. I do not have any other ideas.

pchampin commented 2 years ago

@anatoly-scherbakov

@pchampin this is an interesting idea but I'd say that the required space character is a great irregularity introduced to the syntax, and is a potential source of mistakes.

I agree, unfortunately.

It might be interesting to use YAML tags for something in YAML-LD context, but I have no idea how at present.

I created a dedicated issue for this: #6.

VladimirAlexiev commented 2 years ago

Closing this. If you've made important remarks above, please post them as separate issues. In particular @gkellogg https://github.com/json-ld/yaml-ld/issues/3#issuecomment-1137630441, and maybe @pchampin ?