json-ld / yaml-ld

CG specification for YAML-LD and UCR
https://json-ld.github.io/yaml-ld/spec
Other
22 stars 8 forks source link

Multiple documents in YAML #46

Closed VladimirAlexiev closed 2 years ago

VladimirAlexiev commented 2 years ago

Should YAML-LD allow or prohibit multiple documents in YAML?

PLEASE VOTE with :+1: or :-1: , thanks!


Eg1: multiple identical keys are forbidden by YAML linters. But they are ok if they are in different documents. Example by @ioggstream from https://github.com/json-ld/yaml-ld/issues/42#issuecomment-1173646556:

---
a: 1
...
---
a: 2
...

Eg2: YAML metadata followed by a markdown textual body is widely used in some blog/content management systems:

---
created: 2022-07-03
published: 2022-07-04
title: Frobnification
author: A. U. Thor
...
Frobnification was invented in prehistoric times.
It's a useful meta-process wherein...

As an information architect. I want to be able to use multiple documents in YAML-LD. So that I can transmit several closely related documents (graphs) together.

ioggstream commented 2 years ago

Some notes:

a. Theoretically speaking

  1. a YAML stream includes one or more documents
  2. a stream can be transmitted on the net or archived in a file

In python, when you parse a stream containing multiple documents you need to use a yaml.safe_load_all instead of yaml.safe_load

b. not sure the eg2 provided above is valid yaml.

ioggstream commented 2 years ago

Which YAML parsers support multiple documents?

In python, when you parse a stream containing multiple documents you need to use a yaml.safe_load_all instead of yaml.safe_load

What are useful examples of using multiple documents?

In kubernetes, multiple YAML documents are bundled together to describe related deployment units.

Another example could be bundling in a single file different related datasets that should be imported (e.g metadata, data) or (ontology, dataset).

If we decide to use them in YAML-LD, how should they be represented?

As different JSON-LD documents related between them

As RDF graphs?

Aren't they always RDF graphs?

from rdflib import Graph

g = Graph()
for document in yaml.safe_load_all("docs.yamlld"):
  g.parse(document, format="application/ld+yaml")

Below I formulate a positive use case, but I'm not quite certain we want this because of its complexity

I see it more as a bundling method. The complexity lies inside each document.

WDYT?

VladimirAlexiev commented 2 years ago

@ioggstream

not sure the eg2 provided above is valid yaml

Does it look better now?

As different JSON-LD documents related between them

But how can we relate documents?

Aren't they always RDF graphs?

I agree they should be graphs. Then we need:

Eg this

{"@context": {"@base": "http://example.org", "@vocab":"http://example.org/",
              "spouse":{"@type":"@id"},"statedIn":{"@type":"@id"}},
 "@id": "#bart", "spouse": "#marge", "statedIn": ""}

results in these triples (not quads)

<http://example.org#bart> <http://example.org/spouse> <http://example.org#marge> .
<http://example.org#bart> <http://example.org/statedIn> <http://example.org> .
anatoly-scherbakov commented 2 years ago

My two cents about eg2. This form of writing is often known as front matter, originally proposed by Jekyll. Syntax:

---
title: My Cat
tags:
    - article
    - pets
---

My cat is the most handsome cat in the whole world.

A few examples of software that supports YAML front matter for Markdown documents:

I am using this format to source YAML-LD from the front matter.

However, this is not valid YAML and thus I do not believe it applies to the question at hand. Does it?

gkellogg commented 2 years ago

JSON-LD-API has some options and descriptions for processing multiple script elements within an HTML document using extractAllScripts, that would seem relevant.

VladimirAlexiev commented 2 years ago

@anatoly-scherbakov This is also used by pandoc.

I thought the second doc consists of one long string? But that would require some quoting or escaping, else colons and dashes at BOL will throw it off. Agreed, strike eg2

ioggstream commented 2 years ago

@VladimirAlexiev @gkellogg this will be mainly addressed in https://github.com/ietf-wg-httpapi/mediatypes/pull/55

Thanks for this issue: without this the YAML media type would have missed this piece.

@anatoly-scherbakov wrt the document in the example is valid like @VladimirAlexiev said.

s=("""---
title: My Cat
tags:
    - article
    - pets
---

My cat is the most handsome cat in the whole world.
""")
for d in yaml.safe_load_all(s):
  print(d)