YAML-LD Primer - Githubissues

Motivation

As it was discussed on the August 17th meeting, a YAML-LD Primer document would be useful to introduce new users to this technology. In this issue, I am going to propose a few thoughts about how that document might look like.

At this point, a draft of the Primer is going to be a part of the main YAML-LD spec document. Later, it might be refactored into a separate treatise.

Target Audience: meet Lydia

Lydia just entered a university as a biology student. Her native tongue is Greek. She likes reading non-fiction, plays jazz in a local band, goes clubbing time to time, and has a boyfriend, — but her most keen interest are the ancient pre-historic reptiles of the Permian period.

In her dreams, she watches an edmontosaurus sleeping in the primodial swamp, and over him, cutting the hot vibrant air with iridescent wings, fly the enormously big butterflies — and the giant dragonflies who hunt them.

Lydia dreams to discover more about that long-forgotten, extinct world — to further her science, and to enrich the knowledge of the humanity about the planet's distant past, dimmed by the mist of eons.

And to do that, she relies upon technology.

Lydia needs technology

Biology manages a lot of structured data. Species, their relations to each other, their roles in ecosystems, their unique characteristics all might be described in a semantic way. The volume of data accumulated by the discipline is humongous. That might explain why biology is one of the areas where Linked Data tech has penetrated the most.

Examination and study of one single prehistoric skeleton yields great amounts of data: scans, measurements, and verbal descriptions of every, however minuscule, detail of the bone structure. Every detail is important: it might be a basis for better understanding of the life and struggle of the animal.

This is a lot of tedious work with a microscope and an open Microsoft Word window.

Why Linked Data?

Luckily, Linked Data technology provides a few [CITATION NEEDED] vocabularies to describe species, their relationships and their characteristics. Instead of writing tedious documents, Lydia could concisely describe her specimen in a structured form.

YAML-LD is one way to do that.

Using a text editor, she can describe the origin of the bone she's looking at, a few distinct characteristics of it which are supported by [CITATION NEEDED] well known biological vocabularies.

She might produce a large and lengthy document as a result of her research.

This information will be later published as a paper, put out on the Web, and might be used to write systematic reviews, be referenced by other authors' publications, and otherwise reused.

What does it mean for us?

Assuming that there are systems where Lydia can export her YAML-LD description to be verified, analysed, and published, — she is IMHO a great model user for YAML-LD.

She's not technical beyond the normal advanced-computer-user level. She knows how to edit text in an editor; she can learn YAML due to it being so easy. We can't expect her to be enthusiastic about reading a ton of documentation about RDF or JSON-LD though.

She's only 19, after all. Why would we want to torment a child so?

Henceforth go…

…a few proposals for the YAML-LD Primer

The document should not assume that the user knows much beyond how to edit a document in a plain text editor. We need to be very mindful of our language and terminology we use in order not to scare the user away.

However, after going through our tutorial, the user should be capable of writing YAML-LD documents which will produce valid RDF graphs. The user should be able to reuse typical vocabularies such as foaf, schema.org, dbpedia, wikidata, etc, — and have a clear idea why reusing those is beneficial.

In a tutorial form, we should present the motivation (producing structured data) and guide the user through the learning process.

YAML syntax should be introduced. People who're already familiar with YAML can skip it.
A notion of an RDF graph should be illustrated. It should be shown how a simple YAML document is converted to a graph.
The notion of IRIs needs to be introduced and it must be shown how one can follow the links to learn more about properties and classes from vocabularies.
The concept of a Linked Data vocabulary must be illustrated.
The concept of a Context must be explained. The focus is not about writing contexts but the user should know what a Context is.
Perhaps, an idea of how systems do logical reasoning and how the information can be enriched from different sources should be provided.
Links for further learning from different domains should be present.

I believe that the main focus of the tutorial would be to illustrate how YAML-LD converts to graphs where nodes and edges are resolvable and traversable links, — and how these pieces of content can be used in conjunction with other people's contribution, for everyone's benefit.

Notes

My ideas of how the Primer should look are very vague; I am still thinking about it. I need to do some research about the pre-existing vocabularies that Lydia might rely upon.

However, I have no other idea but making YAML-LD → RDF conversion one of the main topics of the tutorial because what else can we do with the document? I do not see any non-technical motivations for expanding, compacting or framing a YAML-LD document, but converting it to a visual graph and combining with other graphs seems promising to me.

Will be happy to hear feedback.

Discussed at TPAC F2F

Some thought that Primers aren't used, and may not be worth the effort. Keep some introductory information in the spec and focus on tools like yaml-ld.org (or a section of json-ld.org).

Pierre-Antoine Champin: https://github.com/json-ld/yaml-ld-primer/issues/1

Gregg Kellogg: There seems to be a desire to not over burden the specification with extra language, desire for Primer -- We will create a yamlld-primer, and yamlld-bp for best practices, We can then focus on each one.

Ivan Herman: We are fighting in another group to try and figure out a way to do multi-publilcation, let's not bring that here.

Gregg Kellogg: Does anyone have anything specific they'd like to have in a primer?

Gregg Kellogg: Convenience contexts?

Gregg Kellogg is scribing.

Manu Sporny: I'd argue against a Primer. I'm not sure if we have data on how much they're read.

Gregg Kellogg is scribing.

... If people want to know about it, they typically look at the spec, or Best Practices.

Manu Sporny: I don't know how often people read primers

... I'm not sure we have enough data to validate the effort needed to do this work. Instead, keep it in the spec.

... That content might be more useful at the begining of the document

... If you can't explain it in three n pages, then you probably haven't done a good enough job.

Ivan Herman: The OWL Primer was well received, but it was difficult work to boil those concepts down. There are cases where it works.

Ivan Herman: We are heading for a case where VCs will be several specifications, that might be a place where a Primer might work well. OWN is complex because there are 2-3 core specifications plus a bunch of additional things.

Ivan Herman: In this case, maybe it's not the case, YAML-LD might only need to be one specification. Then it becomes a matter of personal style.

Mike Prorock: Something inbetween, what is the audience, there is normal webdev folks, the other side is broader linked data developer... when you're coming in from normal enterprise stuff and they come into this, we've gota figure out how to bridge this concepts back here... there is not another resource to introduce people to this.

Mike Prorock: Who is the Primer for? Normal webdev folks for HTML end user facing content... vs. low level developers using this stuff.

Phil Archer: The first W3C spec I read was the RDF primer, which help me understood RDF... suggestion might be, with YAML-LD, could add to RDF 1.1 primer, there is a whole ecosystem here, LD always needs primers... why would you do this? Why is it better?

Mike Prorock: Yes, especially for people that were not around in the beginning, dealing with a new breed of developers learning Python/Java... they don't have the context, they don't even know what IRC is.

Ivan Herman: One thing I like in the OWL primer, how has the nice feature for 5 serializations of same concept, not all of them are RDF specific, but what they did is same trick as VC spec, you can choose which syntax you want to see... having Linked Data primer which puts together YAML/JSON-LD for example, can choose and compare like in the VC spec.

Ivan Herman: I think that's very helpful, not to do a YAML primer, but a Linked Data for the masses primer, that would make a lot of sense.

Gregg Kellogg: We need to come back to json-ld.org, its easy to add resources to there, the playground is invaluable, thank you Dave Lehn for keeping this thing going.

Gregg Kellogg: The efforts are appreciated.

Gregg Kellogg: Should we have yaml-ld.org domain? Could have it's own fork of JSON-LD Playground... some of upcoming work on playground might make it more suitable to do that... basic profile of YAML, to get that into something that works w/ jsonld.js -- given there's only so much developer time, having online resources might be better use of our time.

Pierre-Antoine Champin: Reconsider if we can put the thig in the front of the spec, unless there is a large amount of specs... we should consider the audience. It was pointed out how JSON-LD was deliberately made easy for end users even if it's more complex for developers. There are different parts of the spec, more for end user, JSON-LD API is more for end developers... that's one thing we should consider, Primer could address different audnece thatn spec itself.

Gregg Kellogg: These things are solved by people that step forward to do the work.

json-ld / yaml-ld-primer

YAML-LD Primer #1