Work on YAML-LD - Githubissues

gkellogg commented 2 years ago

On the w3c/json-ld-syntax w3c/json-ld-syntax#389 proposes advancing work on YAML-LD. I was not able to transfer the issue to this repository, but further discussion and votes in support of starting the initiative in the JSON-LD CG should be voiced here. Given sufficient support, we'll create a yaml-ld repository for work to procede.

gkellogg commented 2 years ago

With support, I'll set up a new repo with a template ReSpec document and pr-preview.

I support starting such an initiative.

pchampin commented 2 years ago

Quoting @anatoly-scherbakov from https://github.com/w3c/json-ld-syntax/issues/389#issuecomment-1127070677

I would propose the following grounds for the @ → $ replacement: while JSON is machine readable and writable, it is not very human readable and — especially — writable. YAML is much friendlier in that regard, due to much lower syntactic noise. The replacement of these characters helps to further reduce the said syntactic noise, making the data files therefore faster to type.

In general, I'd name manually writable semantic data the main purpose for YAML-LD.

I will be happy to participate in the standardization process if one is to be initiated, and to assist however I can.

I see your point about easing the manual editing of YAML-LD. It is a valid point.

But on the other hand, the principle of least surprise is important. Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

What I guess I could leave with is for "native YAML-LD" processors to accept $-keyworrds in addition to @-keywords, with a flag (named e.g. strict-keywords) to disable this behaviour (in case someone wants to use $-prefixed names for another purpose.

gkellogg commented 2 years ago

What I guess I could leave with is for "native YAML-LD" processors to accept $-keyworrds in addition to @-keywords, with a flag (named e.g. strict-keywords) to disable this behaviour (in case someone wants to use $-prefixed names for another purpose.

Yes, and I think a similar flag for using $ instead of @ when serializing. Trying to preserve the original form using $ or @ is probably too complicated.

Note, that using the $ "namespace" will overlap with other existing uses, at least from JSON. For example, JSON Schema has $schema, $vocabulary and other other keywords that would not otherwise overlap with the JSON-LD keyword namespace.

tetron commented 2 years ago

A formal YAML-LD variant that is a minimum-surprise syntax variant of JSON-LD is a good idea, and I don't want to get in the way of that discussion.

However, I just wanted to link my comment from the other issue:

https://github.com/w3c/json-ld-syntax/issues/389#issuecomment-1127716287

JSON-LD has some limitations when working with certain idiomatic JSON structures. Schema Salad is a schema language for YAML and JSON documents which describes validation and transformation from idiomatic YAML structures to JSON-LD structures and then on to RDF. We have used it extensively for our own use case (describing the Common Workflow Language but I wanted to mention it here to see if there's interest in generalizing it, separately from this YAML-LD discussion.

anatoly-scherbakov commented 2 years ago

@pchampin @gkellogg I agree with using @-keywords as a default, unless a flag is supplied to the processor to enable $-keywords. The YAML-LD preprocessor which converts $ to @ might also take care to convert only the keywords which are reserved by JSON-LD. This will also resolve the possible conflict with JSON Schema (I guess the two standards have different sets of keywords).

@tetron Thank you for the information, will have a look to compare Schema Salad with plain YAML-LD that I am currently using.

VladimirAlexiev commented 2 years ago

@tetron There is no dispute that JSON-LD is no schema language and is often complemented with JSON Schema.

JSON-LD has some limitations when working with certain idiomatic JSON structures.

But can you elaborate on this point using examples? I think that JSON-LD 1.1 is very flexible, eg local term definitions, add/ remove auxiliary keys that have no reflection in RDF, etc

VladimirAlexiev commented 2 years ago

@OR13 @nicholascar please vote for YAML-LD CG workgroup above

OR13 commented 2 years ago

I'm not sure I have the cycles to help much with YAML-LD, but I think its a good idea.... we use OAS / YAML with JSON-LD and JSON Schema often.

In particular, I like the idea of controlling both semantics and data shape at the same time, using only 1 file.

tetron commented 2 years ago

But can you elaborate on this point using examples? I think that JSON-LD 1.1 is very flexible, eg local term definitions, add/ remove auxiliary keys that have no reflection in RDF, etc

So, schema salad was created in response to JSON-LD 1.0 (and roughly 5 years before JSON-LD 1.1), the main motivations were:

Use of YAML because it has comments and multi-line strings
Desire to be able to express the subject in the "key" part of an object. This is basically id maps of JSON-LD 1.1
Similarly, an equivalent to type maps
Default predicate assignment when a value is a scalar instead of an object
Relative identifiers that are syntactically scoped, e.g.

Examples of the last two:

steps:
  # this is an identifier map, the key is the id
  step1:
    in:
      # another identifier map,
      # the identifier is syntactically scoped so it gets appended to the 
      # enclosing id
      # the value is assigned to a default predicate of "source" since it's a scalar
      input_parameter: source_parameter_uri
    out: [output_parameter]

This is translated to json-ld 1.0 that looks something like this:

{
"steps": [{
   "@id": "step1",
    "in": {
      "@id": "step1/input_parameter",
      "source": "source_parameter_uri"
    },
   "out": [{
      "@id": "step1/output_parameter"
    }]
}]
}

Validation, but using Avro schema instead of json-schema, one of the reasons being that you could hoist Avro schema on top of JSON-LD (so the schema itself could be valid JSON-LD), at the time it seemed impossible to write a json-ld context for json-schema structures
Having a single file that describes the JSON data shape/validation, can generate the JSON-LD context, produce RDFS, and produce documentation (for the low, low price of inventing and maintaining a new data definition system).

We've also implemented code generators for Python and Java. It would probably also be straightforward to write a translator to express the schema as SHACL.

VladimirAlexiev commented 2 years ago

@OR13 what is OAS?
@tetron I call this "polyglot modeling", of which there are more examples (see a couple at https://drive.google.com/file/d/15RuCfyresjmc0JWoNl8Jpjpbf_O65UkD/view)
@cmungall Please vote on top for this issue, as LinkML is based on YAML.

I'll start a "YAML-LD UCRs" issue and include "polyglot modeling"

nichtich commented 2 years ago

Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

I fully agree. To avoid confusion I'd vote to

either limit the specification of YAML-LD to half a page, essentially stating that YAML-LD is JSON-LD expressed in YAML, so any standard conversion YAML <-> JSON (without any exceptions such as $ / @ keys) can do
or to make clear that YAML-LD is not "JSON-LD in YAML" but an independent data format, loosely based on JSON-LD. Even if it's 99% like JSON-LD, you cannot use standard tools but need a specific YAML-LD aware software library to avoid running into edge cases.

Option 1 can be done as part of JSON-LD, option 2 is an independent project with unknown outcome and adoption.

anatoly-scherbakov commented 2 years ago

In my opinion, writeability is the primary reason to even have YAML-LD, and it is also my feeling that @ → $ replacement had greatly improved my own writing experience when authoring YAML-LD data files and contexts.

I believe the -LD suffix can be generally used to describe different data formats somehow augmented with Linked Data. For instance,

CSV-LD had been mentioned before (though I am uncertain what exactly it is and how it relates to CSVW);
IPLD is a medium to describe data in a distributed peer to peer way, and is a foundation for IPFS;
One can easily imagine TOML-LD, TSV-LD, Parquet-LD, Protobuf-LD, perhaps even Excel-LD.

This is not an argument but I am also using Markdown with YAML-LD front matter meta data, and call it Markdown-LD :)

Thus, if to choose from the options @nichtich has proposed, I would vote for (2). I still believe the specification can be half a page to describe the exact logic of the conversion from YAML-LD to JSON-LD and vice versa, but I believe that the potential space of -LD data formats is vast. Each of them should be equipped with its own tools, even though the meaning of those formats can probably be in most cases derived from JSON-LD 1.1 specification, and we could use JSON-LD contexts to interpret the data files.

I have written up a paper about YAML-LD recently and I would be happy to get feedback from the community, but the conference I submitted the paper to requires double blind review so I am uncertain whether I am at liberty to share the draft. Guess I would have to wait till the organizers' decision about whether they're going to publish it or not.

pchampin commented 2 years ago

@nichtich I beleive that your option 2 above would be far too confusing for many people. I would rather make YAML-LD a superset of JSON-LD in YAML, i.e. adding some specific idioms / patterns that "native" YAML-LD processors would understand (e.g. the $-keywords replacement). But basically those specific idioms would be pre-processed for producing valid JSON-LD, which could then be fully compliant.

And I would make it easy to recognize/require YAML-LD documents that are strictly JSON-LD in YAML (i.e. do not require the pre-processing), like a media-type parameter (e.g. text/ld+yaml;profile=strict).

VladimirAlexiev commented 2 years ago

Folks, please contribute requirements in https://github.com/json-ld/yaml-ld/issues/2

VladimirAlexiev commented 2 years ago

@pchampin @nichtich Sorry, I find your "will be confusing" argument unconvincing. Let me play devil's advocate and apply your argument to other situations:

"why do we need Turtle shortenings like s p o1, o2 and a: these are too confusing, let's stick with ntriples"
"why do we need N3 short forms like -> for log:infers and {graph1} -> {graph2} for inference. Too confusing, let's stick with Turtle". I think @josd and TimBL may have an issue with that ;-)
"why do we need JSON-LD given that RDF JSON can express any RDF? Will be too confusing for the hordes of novices who won't read the (excellent) JSON-LD spec". (I hope I won't get a beating from the folks in this repo for these words)
"why do we need SHACL-Compact, just write SHACL. People don't want to learn a second language, and it would be too confusing". But I find SHACLC a major feature for parity against SHEX. (And the argument even goes into details such as what does the grammar allow, see https://github.com/w3c/shacl/issues/12#issuecomment-1110358866)

As a data architect, I want to be able to write shortcut notation like this, and get proper RDF (see https://github.com/json-ld/yaml-ld/issues/2 Shortcuts), and even the other way around. This below is shorthand turtle, but I'd like a similar spirit in YAML.

:Person a rdf:Class;
:born a rdfs:Property; <- :Person; -> xsd:date.
Doc1234 :creator ~tobyink .
~tobyink a :Person; :born 1980-01-01 .
`Example-Distribution 0.001 cpan:TOBYINK` issued 2012-06-18 .

OR13 commented 2 years ago

I left an example of our use of OAS (Open API Specification) and JSON-LD here: https://github.com/json-ld/yaml-ld/issues/2

TLDR; OAS supports JSON Schema represented in YAML, we tweaked the JSON Schema to support JSON-LD terms, so now we can present RDF types and JSON Schema types in a single YAML file.

gkellogg commented 2 years ago

Having YAML-LD behave differently from YAML2JSON + JSON-LD sounds like a very bad idea.

I fully agree. To avoid confusion I'd vote to

either limit the specification of YAML-LD to half a page, essentially stating that YAML-LD is JSON-LD expressed in YAML, so any standard conversion YAML <-> JSON (without any exceptions such as $ / @ keys) can do

or to make clear that YAML-LD is not "JSON-LD in YAML" but an independent data format, loosely based on JSON-LD. Even if it's 99% like JSON-LD, you cannot use standard tools but need a specific YAML-LD aware software library to avoid running into edge cases.

Option 1 can be done as part of JSON-LD, option 2 is an independent project with unknown outcome and adoption.

JSON-LD 1.1 explicitly chose to use an intermediate representation to allow for other formats to map easily into that representation. While your option 1 can't realistically be done in 1/2 page, IMO any spec should confine itself to this level over interoperability. (This is consistent with allowing @ to be optionally replaced with $, I believe).

The challenges of getting JSON-LD standardized in the first place should be a cautionary for trying to do someone similar but different, so I don't see option 2 as viable.

cmungall commented 2 years ago

I was asked by @VladimirAlexiev to vote on this, so I did, but to be clear I am voting that there should be some official position on how to represent JSON-LD in YAML.

I am equally convinced by the two schools here:

Avoid surprises and use identical keywords
optimize for writability and use more yaml-friendly characters

I am personally biased towards 1 as I rarely author contexts directly, instead I autogenerate these from a yaml-native "polyglot" language (LinkML), but I am sympathetic to those who do author the contexts directly.

pchampin commented 2 years ago

@VladimirAlexiev regarding your examples: Turtle (respectively N3) is a strict superset of N-Triples (respectively Turtle) so I don't see this as confusing. Shacl-C is a completely different syntax from Shacl in Turtle, so I don't see this a confusing. I agree that the co-existence of JSON-LD and RDF/JSON might be confusing, and notice that the RDF WG decided at the time to recommend only one of them (RDF/JSON is only a note).

@nichtich's proposition 2 is "an independent data format, loosely based on JSON-LD", where "loosely" could mean up to "99%". The closest this YAML-LD is to JSON-LD, the more surprising it will be for users when they hit a difference. Furthermore "you cannot use standard tools but need a specific YAML-LD aware softare" sounds a lot like reinventing the wheel.

I guess what I really don't want is to define something from scratch. If YAML-LD is to be based (even loosely) on JSON-LD, then I would strongly advise that YAML-LD processors are based on a JSON-LD processor under the hood. Experience shows that developing a bug-free JSON-LD processor is tricky. Requiring a similar, but different, effort for YAML-LD, seems like a bad idea.

nichtich commented 2 years ago

If YAML-LD is going to be more than a simple application of YAML syntax to express JSON-LD documents (proposition 1), it may help to reduce references to JSON-LD to avoid mixing levels of description (YAML, JSON-LD, RDF...). The specification could be limited to rules how to transform a YAML-LD document into a JSON document (e.g. replace defined use of $ with @, expand YAML tags...) without detailed knowledge of JSON-LD and RDF.

For instance the specification could explain how to transform these YAML documents

$id: "http://example.org"
---
$id: 1

into JSON documents {"@id": "http://example.org"} and {"@id": 1} respectively. The latter is not valid JSON-LD but this would be irrelevant to transformation rules from YAML-LD to JSON.

The final sentence of the specification would be requirement that JSON encoded by YAML-LD transformation rules must be valid JSON-LD. This can (and should) be checked with existing JSON-LD parsers.

anatoly-scherbakov commented 2 years ago

@nichtich I'd agree. I tried to formalize this as follows.

A document D is designated as a valid YAML-LD document if, and only if:

D is a valid YAML document;
The following transformation converts D into a valid JSON-LD-document:
- convert to JSON,
- for every key and value that is a string and start from $ character, replace $ with @ if and only if the resulting string will be a valid JSON-LD reserved keyword as per its specification.

pchampin commented 2 years ago

Idea: instead of $-keywords, why not defining a YAML tag for each JSON-LD keyword (i.e. !context, !type, ...). These tags would only expect an empty string, so they should be use "on their own", e.g.

!contex t:
  !vocab : http:/:example.com/ns/
!id : #test
!type : Foo
bar: baz

PROS: while $context and other $-keywords could (in theory) be intended as a regular JSON keys, tags have no direct JSON interpretation, so tag-keywords are unambiguous CONS: from a short experiment, it seems you need a space between the tag and the colon (:) (while you don't need it with "regular" keys), which is error-prone...

gkellogg commented 2 years ago

A couple of thoughts on how to specify YAML-LD as an extension of JSON-LD API:

All JSON is retrieved using the document loader. A YAML-LD spec can describe the requirements for an alternative loader.
Some consideration (for or against) an equivalent for finding a context using an HTTP Link Header.
YAML-LD and JSON-LD documents and contexts (and embedded HTML variations) should probably be intermixable, which wouldn't really require much extra to support.
Most API entry points end with words including "Resolve the promise with flattened output transforming flattened output from the internal representation to a JSON serialization, if necessary." (There may be some more points internally). Abstracting this to allow the output format to be specified with a new API option would yield more benefits beyond YAML-LD.

gkellogg commented 2 years ago

There is certainly enough interest to start an activity for YAML-LD. I'll create and setup a repo for this purpose. I think both the spec and the URC documents can be in the same repo, but PR Preview will only use one of them for nicely formatted PRs.

I'll put out a proposal for a CG call (maybe @pchampin can help with a Zoom setup), which would be useful for more than just YAML-LD.

gkellogg commented 2 years ago

Repo has be set up at https://github.com/json-ld/yaml-ld. If you would like to contribute, and are a member of the JSON-LD Community Group I can add you to the contributors team. Please create an issue (or respond to an already existing issue) to be added to the team.

gkellogg commented 2 years ago

Moving this issue to the yaml-ld repo.

anatoly-scherbakov commented 2 years ago

@pchampin this is an interesting idea but I'd say that the required space character is a great irregularity introduced to the syntax, and is a potential source of mistakes.

It might be interesting to use YAML tags for something in YAML-LD context, but I have no idea how at present. $type seems to work fine to assign RDF types to nodes. I do not have any other ideas.

pchampin commented 2 years ago

@anatoly-scherbakov

@pchampin this is an interesting idea but I'd say that the required space character is a great irregularity introduced to the syntax, and is a potential source of mistakes.

I agree, unfortunately.

It might be interesting to use YAML tags for something in YAML-LD context, but I have no idea how at present.

I created a dedicated issue for this: #6.

VladimirAlexiev commented 2 years ago

Closing this. If you've made important remarks above, please post them as separate issues. In particular @gkellogg https://github.com/json-ld/yaml-ld/issues/3#issuecomment-1137630441, and maybe @pchampin ?

json-ld / yaml-ld

Work on YAML-LD #3