solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
482 stars 44 forks source link

Spec authoring in plain HTML+RDFa #6

Open csarven opened 5 years ago

csarven commented 5 years ago

I find spec generator workflows unnecessarily complicated and outputs to be constrained.

We need to have full control over the output if we want it machine-readable to highest degree.

The LDN spec managed this just fine and there were no compromises.

So, can we stick to plain HTML+RDFa editing .. and along the way open the possibility to use Solid-centric tools?

RubenVerborgh commented 5 years ago

No strong opinion, was following @elf-pavlik's advice. I care only about the end result and the pace at which we achieve it.

That said, two quick points:

We need to have full control over the output if we want it machine-readable to highest degree.

Not necessarily a contradiction; it's markdown, which is an HTML superset.

So, can we stick to plain HTML+RDFa editing

Might not be as accessible to the wide audience that Solid is aiming at. HTML, sure. HTML+RDFa and getting everything right, maybe not.

csarven commented 5 years ago

I know you don't mind in the end but I'll say it for the general record:

Markdown nevertheless requires additional machinery.

HTML+RDFa is accessible enough to spec authors and editors. I think it'd be simpler to work through RDFa updates on a needed basis than to update the generators' output in the end - pretty much rewriting at that point.

The spec generator workflow doesn't fit into the "Solid way of doing things" (tm) :)

I'd like to aim at authoring/editing the spec resting on a Solid server using our favourite applications :) We should embrace these kind of opportunities as they present themselves.

elf-pavlik commented 5 years ago

I recall TimBL suggesting at Social WG F2F in Paris to instead of HTML+RDFa use script tags to embed RDF in HTML. I really doubt that many people will actually edit RDFa.

I'd like to aim at authoring/editing the spec resting on a Solid server using our favourite applications :)

I haven't seen one supporting git PRs yet... neither any notion of version control in general for solid

csarven commented 5 years ago

use script tags to embed RDF in HTML.

Like manually? or with existing tooling? Which? In which serialisation(s)? How would duplicate information be managed? ... Did anyone actually went through that process for spec authoring/editing capturing the spec's content in full, along with things that test suites and implementation reports can latch on to,... or is that all hypothetical? URL?

There are cases which is fine to include RDF through script, but I don't think this is one of them. There is a perfectly obvious candidate here: RDFa.

I really doubt that many people will actually edit the specs, irrespective of the format, let alone RDFa involved or not. I think this lies mostly on the authors and editors - or loosely, contributors. Moreover, not everyone is required to do any RDFa. They can stick to HTML. I'm happy to volunteer to take on full RDFa responsibility in specs. Will this resolve the doubt on "many people will actually edit RDFa"?

I haven't seen one supporting git PRs yet... neither any notion of version control in general for solid

We could interpret those things as weakness or shortcoming - they may very well be - but I see opportunities. We definitely don't have all that stuff working, but we can set a direction to make it happen. Why would an application supporting git PRs be a requirement? Is that a showstopper? Version control would indeed be nice, however, it doesn't necessarily mean that the workload is all on the server. Applications can handle it as well. FWIW, dokieli already has the notion and features to handle the original resource, a sense of (im)mutable versions, with visible and machine-readable relations just like in W3C specs that you are familiar with, and acts to read/write Memento Timemap data itself. None of that is expected to be super ready for the masses but IMO a reasonable starting point. With actual dogfooding, we can make this stuff happen.

The fundamental point about this issue is that on one hand we could fallback to some pipeline which doesn't particularly fit into the Solid workflow, and on the other hand, have an opportunity to experiment and improve stuff along the way. The net win is not only for the sake of the specs but everything else. I definitely prefer the latter world and that's what I'm proposing. :)

kjetilk commented 5 years ago

RDFa is attractive for many reasons, but I have found authoring with it a little cumbersome. Just embedding Turtle could also work, Turtle is, after all, a lot easier to author.

I'm fine either way, but if I was pressed to choose, I'd prefer embedded Turtle.

csarven commented 5 years ago

Embedding Turtle at the very least entails creating, maintaining and publishing duplicate information.

kjetilk commented 5 years ago

Embedding Turtle at the very least entails creating, maintaining and publishing duplicate information.

It does, but if the information is in close proximity of each other, it may not be an awful editorial overhead. I think https://www.w3.org/wiki/N3inHTML is the method @timbl was referring to.

csarven commented 5 years ago

I'm aware of non-RDFa embedding options in HTML. Parsing rules are not standardised so can't expect tooling out there to interpret it consistently. There is no evidence. There are no CGs/WGs tackling to resolve these issues as far as I'm aware.

Why would going through the process of handling and committing to duplicate information in an HTML document be of attraction? Using N3 or dialects in HTML forces the HTML document to contain two sets of identical content as separate data islands. It introduces complexity (and error-prone) into the system and bloats the spec for no particular benefit... we go through that just because few potential spec contributors finds Turtle easier to handcode than RDFa? Is the resulting document pretty and actually usable in the wild?

I can sympathise that RDFa is not attractive to handcode for some. I acknowledge Turtle's appeal for handcoding. The proposal of this issue is to go with HTML+RDFa.. and so in response to the potential inconveniences, we leave the RDFa bits to the fanatics, and in the meantime improve the tooling (server and client applications).

kjetilk commented 5 years ago

But parsing the proposed N3inHTML is trivial, it is all in the script tag. @tobyink has implemented it, and that is good enough for me :-) HTML5 parsing, OTOH, has always come across as pretty scary to me, so that is where the constraints are, I think, but just looking for the script tag, grab the content inside CDATA, and then pass that to the Turtle parser is very straightforward. It certainly is much easier way to have the RDF graph you'd want than RDFa. I don't doubt the code quality of RDFa parsers, but I don't see highly maintained projects in that space, but admittedly, I'm not following it too closely.

I'm not convinced it would represent duplicate content, depending on what we actually want to encode in RDF. I currently do not see great value of encoding long natural language passages in RDF, it is mostly very specific stuff.

Actually, I think we should just start out with something, and see where it takes us.

csarven commented 5 years ago

I'm not talking about whether a tooling is capable but the very fact that different and independently built tools are not consistent in their handling of it. It is neither a simple case of finding and extracting what's in script. Not to mention the tooling behind it has to have both markup and Turtle parsing know-how. Again, this is not standardised. As much as I love Toby's work, we are not going to ask potential consumers of the spec to use Toby's work. "This website is best viewed in Netscape Navigator 3 Gold"?

Take the LDN spec as a guide FWIW. Take the "what we actually want" from there and re-serialise to Turtle and throw it in script. So, given for example:

<span about="#test-consumer-header-discovery" id="test-consumer-header-discovery" property="skos:definition" rel="skos:topConceptOf" resource="#ldn-tests-consumer" typeof="skos:Concept">make an HTTP <code>HEAD</code> or <code>GET</code> request on the target URL, and use the <code>Link</code> header with a <code>rel</code> value of <code>http://www.w3.org/ns/ldp#inbox</code>.</span>

Strip out the RDFa and throw it into script/Turtle:

<span>make an HTTP <code>HEAD</code> or <code>GET</code> request on the target URL, and use the <code>Link</code> header with a <code>rel</code> value of <code>http://www.w3.org/ns/ldp#inbox</code>.</span>

and

<script type="text/turtle"># //<![CDATA[
@prefix this: <https://www.w3.org/TR/ldn/#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

this:test-sender-header-discovery
  a skos:Concept ;
  skos:topConceptOf this:ldn-tests-consumer ;
  skos:definition "make an HTTP <code>HEAD</code> or <code>GET</code> request on the target URL, and use the <code>Link</code> header with a <code>rel</code> value of <code>http://www.w3.org/ns/ldp#inbox</code>."^^rdf:HTML .
# //]]></script>

I'm not sold on the idea of any content duplication when there is no technical argument for it to exist in the first place. The core argument seems to rely on that some people prefer to handcode Turtle. I beg to differ on the preferences.

I would argue that handcoding is not what we ultimately want to rest on any way. We want applications to help us with those tasks. We are not going to prove our point about Solid through RDF, SPARQL.. handcoding solutions. That's been tried and... it only caters to a niche community.

So, why take that approach when it is a dead-end especially when there is another option which has clear opportunities to take advantage of? Let's start out with HTML+RDFa then as the obvious candidate that is designed to handle cases like this... instead of pigeonholing an alternative subpar approach that only happens be of convenience to some contributors.

elf-pavlik commented 5 years ago

Take the LDN spec as a guide FWIW. Take the "what we actually want" from there and re-serialise to Turtle and throw it in script.

Does anyone actually consume that RDFa from LDN spec? Who do you see as an audience who wants such structured data in Solid specs?

I think keeping edits on github and having convenience of Markdown should result in more concrete PRs from contributors, which then can get discussed and merged by the few editors (sometimes with amendments). On the other hand HTML+RDFa may result with more Issues prone to 'hand waving' based suggestions and maybe even lesser likelihood of making amendments due to overhead of editing HTML+RDFa.

I've added :-1: on your original comment, others can add another one or counter with :+1: , this way we may get a better read on people's preferences.

kjetilk commented 5 years ago

I'm not talking about whether a tooling is capable but the very fact that different and independently built tools are not consistent in their handling of it.

Do you know that for certain?

It is neither a simple case of finding and extracting what's in script. Not to mention the tooling behind it has to have both markup and Turtle parsing know-how. Again, this is not standardised. As much as I love Toby's work, we are not going to ask potential consumers of the spec to use Toby's work. "This website is best viewed in Netscape Navigator 3 Gold"?

Hehe, but it is not about Toby's code, it is about looking at Toby's code and realizing it is a lot simpler than RDFa parsing. :-) Nor is the writeup on the W3C wiki just Toby's, @timbl has also expressed a preference.

Take the LDN spec as a guide FWIW. Take the "what we actually want" from there and re-serialise to Turtle and throw it in script.

Ah, but the thing is that what is LDN is not what I want. I want to take it a good step further. I mean, I do see the value of it, but I think the really big value is when tests can be constructed from the spec without further human intervention. So, the skos:definition you have there isn't what I want to do.

If you look at this test, you get an idea of where I'm going, and I'm not sure I'd like to write that in RDFa, but it is perfectly digestible as Turtle. Nor am I sure the structure of the text would make it feasible to have a single expression of it.

So, given for example:

<span about="#test-consumer-header-discovery" id="test-consumer-header-discovery" property="skos:definition" rel="skos:topConceptOf" resource="#ldn-tests-consumer" typeof="skos:Concept">make an HTTP <code>HEAD</code> or <code>GET</code> request on the target URL, and use the <code>Link</code> header with a <code>rel</code> value of <code>http://www.w3.org/ns/ldp#inbox</code>.</span>

So, it would be more like

make an HTTP
<span about="#test-consumer-header-discovery" id="test-consumer-header-discovery" typeof="http:RequestMessage"><span property="http:method">HEAD</span> request on the <span rel="http:requestURI" resource="/public/foobar.ttl">target URL</span>, and use the header <span property="httph:link" object="rel='http://www.w3.org/ns/ldp#inbox'"> value of <code>http://www.w3.org/ns/ldp#inbox</code>.</span>

(I don't really remember the RDFa syntax... :-) ). I suppose it could be done, but I'm not sure it would make for neither a good spec nor a rigorous test.

I'm not sold on the idea of any content duplication when there is no technical argument for it to exist in the first place. The core argument seems to rely on that some people prefer to handcode Turtle. I beg to differ on the preferences.

You may of course! But then, I might just be nasty and challenge you to code the test cases I'm working on as RDFa, while at the same time ensure no content duplication and fair readability :-)

I would argue that handcoding is not what we ultimately want to rest on any way. We want applications to help us with those tasks. We are not going to prove our point about Solid through RDF, SPARQL.. handcoding solutions. That's been tried and... it only caters to a niche community.

Yeah. But my preference is to handcode this and focus on what consumers can do. I think that it is as much a lack of consumers as a lack of producers. Now, I have a pretty much working consumer that actually runs these tests.

So, why take that approach when it is a dead-end especially when there is another option which has clear opportunities to take advantage of? Let's start out with HTML+RDFa then as the obvious candidate that is designed to handle cases like this...

I'm not so sure it was designed for cases like these, although it was designed as a generic RDF serialization, I'm not so sure it can cater for non-trivial, human read-and-writeable graphs. So, you may not care about human-writable, but then, we'd have to write the tool to write the tests first, and I don't see that as feasible in the next few months.

acoburn commented 5 years ago

Assuming that the published specification will be HTML, it seems that there are two issues being discussed here.

First, there is the issue of ensuring the documents in this repository are easy to edit by humans. Markdown is certainly easy to edit. I also find HTML easy to edit. HTML+RDFa is slightly more complicated than vanilla HTML, but I have also found that to be reasonably easy to edit by hand -- it is more complex than markdown, but with only a little more effort you have semantically rich HTML. I take it for granted that one goal here is that the base documents are easy to edit; otherwise, maintenance of the specification will be too onerous.

The second issue is whether the published specification, itself, should be machine readable. If this isn't a goal, then there is no reason to consider RDFa; however, if this is a goal, then let's discuss how best to achieve that. RDFa is certainly one way to get there.

Perhaps, rather than discussing whether or not to use RDFa, we should first come to a consensus on whether it is a goal to have the resulting documents be machine readable in a semantically meaningful way. If we have consensus on that, then the issue of RDFa should become much more clear.

RubenVerborgh commented 5 years ago

Another consideration: there are 7 documents. How many people do we have that can and are willing to edit HTML+RDFa, and how many documents do they want to maintain each?

csarven commented 5 years ago

@elf-pavlik

Does anyone actually consume that RDFa from LDN spec? Who do you see as an audience who wants such structured data in Solid specs?

W3C et al have invested time into putting some RDFa into spec generators (eg Bikeshed, ReSpec). Have you looking into that stuff or viewed the source of some specs?

Would you be willing to investigate: Does anyone actually consume any machine-readable data from any of the existing W3C specs? Who do they see as an audience who wants structured data in W3C specs? FWIW, the answer is not going to be categorically different than what you ask about the LDN spec.

This is not even about the big bad scary rocket science RDFa but I don't feel like rehashing at this point because you have not even attempted to answering my questions.

@kjetilk

Do you know that for certain?

Yes, to a good degree. Both HTML's extensibility mechanism and relevant RDF specs talk about it as non-normative. There is no stand-alone spec (or alike) to the best of my knowledge and I have not seen implementations handling it consistently. But, since you seem to be doubting what I'm saying, why don't you show the contrary instead of asking me?

it is about looking at Toby's code and realizing it is a lot simpler than RDFa parsing. :-)

You are looking at this the wrong way.

We can dump gobbledygook data format and bash some regex to pull data out. As possible as that is, it is completely irrelevant and not what we are after. So the argument that just because a tool does it, it doesn't follow that other implementations do it the same way or even correctly. People have been hacking at it and making the best of it, but that's as far as it gets us (unfortunately, today).

timbl..

I'd appreciate a technically rooted argument instead of passing an https://en.wikipedia.org/wiki/Argument_from_authority

Ah, but the thing is that what is LDN is not what I want.

"Ah, but the thing is that" I do understand what you are trying to achieve as I've already mentioned extensibility and how to go about it in Gitter chat (https://gitter.im/solid/solid-spec?at=5d0de64bd35d4162a87baedb .. https://gitter.im/solid/solid-spec?at=5d109919d010383639c2ffa3 ). We even had a audio/video touching this. So, not sure why you're bringing this up as such as if none of that occurred.

I didn't have the luxury at the time to completely auto build a test suite from a spec.. I was only trying to assist you in what you were looking into because I have already went through that thought process. I completely support what you want to achieve.

I'm not sure I'd like to write that in RDFa, but it is perfectly digestible as Turtle.

RDFa is perfectly digestible too. Something about the eye of the beholder?

Nor am I sure the structure of the text would make it feasible to have a single expression of it.

Anything that can be stated in Turtle, can be in RDFa as well. That's a non issue to me.

I might just be nasty and challenge you to code the test cases I'm working on as RDFa, while at the same time ensure no content duplication and fair readability :-)

Do you want to be "nasty" and "challenge" me after I explicitly stated that I volunteer to make it happen ( https://github.com/solid/specification/issues/6#issuecomment-508583897 )?

What you ultimately need is the test suite to consume some RDF from the spec. Not forcing you to write RDFa if you don't want to. We can collaborate and compromise where necessary without getting into nastiness or challenges. You state what you need in the spec,.. consider it done. Is that a reasonable meeting point?

I'm not so sure it was designed for cases like these

The initiative to get RDF in HTML and XML family languages is what ultimately lead to RDFa. The approach to embed RDF in HTML through HTML's extension mechanism came much later, and only as non-normative. The rules are not as well defined and each RDF syntax has to figure out its own way working with the host language, multiple scripts, own scoping rules and so on. If the information needs to be (human- and) machine-readable, that's RDFa in HTML out of the box. If you don't care about human readable aspect of the , then the spec is probably not the ideal place to shove hidden and duplicate data to begin with. You may want to consider alternative approaches in that case.

One can use a hidden HTML list or a table with three columns to make statements in RDFa. Although I don't think that's a good idea, it is still more properly defined than mangling with hidden script blocks.

I'm not so sure it can cater for non-trivial

Completely orthogonal. Here are some use cases https://www.w3.org/TR/xhtml-rdfa-scenarios/ from RDFa early on. I'm sure it is not a complete coverage of things and you can certainly raise that it doesn't outline x,y,z. Why do you feel that the cases you want to cover are non-trivial for RDFa? I certainly don't think that's the case.

You are arguing from the point of using your preference for the test suite in the spec. There is more to the spec than the parts relevant to the test suite. Trying to cover different parts using different methods makes no sense to me, especially when it can be accomplished in a uniform way via RDFa to begin with.

you may not care about human-writable

I don't quite follow. As in Turtle in HTML is but RDFa in HTML is not? Based on what criteria exactly? Citation needed.

Will my commitment to meet test suite's needs in the spec help to resolve your issue?

@acoburn

consensus on whether it is a goal to have the resulting documents be machine readable in a semantically meaningful way

I was under the impression that we have achieved that. At least I hope so :)

@RubenVerborgh

How many people do we have that can and are willing to edit HTML+RDFa, and how many documents do they want to maintain each?

As one possible answer: we only need one dedicated person. I've volunteered to edit/author. More the merrier.


Folks, I suggest we identify areas to compromise. For example, we don't have to constantly keep HTML or the RDFa sparkling clean - although that'd be nice if we want to reuse at anytime without worries. It can be a one time thing.

I ultimately would like to have the significant versions (like along the lines of WD, CR, PR, REC, NOTE..) of the specs to be proper HTML+RDFa documents. I'm okay to compromise on everything else in the meantime because finding some agreement here is more important to me than arguing about all of the details in the process to get there. As the comments show, some of us prefer convenience and familiar practices, while others are okay to explore the uncharted. Happy to hear about how you'd like to cooperate.

acoburn commented 5 years ago

I would volunteer to help edit any RDFa documents.

kjetilk commented 5 years ago

@kjetilk

Do you know that for certain?

Yes, to a good degree. Both HTML's extensibility mechanism and relevant RDF specs talk about it as non-normative. There is no stand-alone spec (or alike) to the best of my knowledge and I have not seen implementations handling it consistently. But, since you seem to be doubting what I'm saying, why don't you show the contrary instead of asking me?

But... But... It is like 10 lines of code... How can that possibly go wrong...? But if you have actually researched it, OK.

it is about looking at Toby's code and realizing it is a lot simpler than RDFa parsing. :-)

You are looking at this the wrong way.

* First off, once you have a conformant parser, the point is irrelevant as it is the machine that's doing the job. So, what simplicity are you referring to in its totality? Time, space, readability, completeness..? Have you compared algorithms?

What algorithms? Checking the DOM for script with a certain attribute?

Ah, but the thing is that what is LDN is not what I want.

"Ah, but the thing is that" I do understand what you are trying to achieve as I've already mentioned extensibility and how to go about it in Gitter chat (https://gitter.im/solid/solid-spec?at=5d0de64bd35d4162a87baedb .. https://gitter.im/solid/solid-spec?at=5d109919d010383639c2ffa3 ). We even had a audio/video touching this. So, not sure why you're bringing this up as such as if none of that occurred.

Because you weren't referring to my test descriptions, I guess for the simple reason that I had not presented them to you at that point. :-)

I didn't have the luxury at the time to completely auto build a test suite from a spec.. I was only trying to assist you in what you were looking into because I have already went through that thought process. I completely support what you want to achieve.

OK, great! That was not at all clear to me.

I'm not sure I'd like to write that in RDFa, but it is perfectly digestible as Turtle.

RDFa is perfectly digestible too. Something about the eye of the beholder?

Absolutely! But that eye matters, as we are humans needing to write things.

Nor am I sure the structure of the text would make it feasible to have a single expression of it.

Anything that can be stated in Turtle, can be in RDFa as well. That's a non issue to me.

Sure, but it is still a huge issue for me, as I have been unable to find a workflow where I'm nearly as efficient with writing RDFa as I am with Turtle.

I might just be nasty and challenge you to code the test cases I'm working on as RDFa, while at the same time ensure no content duplication and fair readability :-)

Do you want to be "nasty" and "challenge" me after I explicitly stated that I volunteer to make it happen ( #6 (comment) )?

I think that is the main misunderstanding between us. I thought you were volunteering to do essentially what you already did for LDN, not to support me writing any weird RDF that may arise from the test suite.

What you ultimately need is the test suite to consume some RDF from the spec. Not forcing you to write RDFa if you don't want to. We can collaborate and compromise where necessary without getting into nastiness or challenges. You state what you need in the spec,.. consider it done. Is that a reasonable meeting point?

Sure! I'm not against writing RDFa, I'm just afraid it will slow us down, and not provide significant value as I don't see how it can prevent duplication to the extent that might justify its use. If you are confident you can do it, then I'm very happy to accept that help.

you may not care about human-writable

I don't quite follow. As in Turtle in HTML is but RDFa in HTML is not? Based on what criteria exactly? Citation needed.

Very limited personal experience. I personally find any non-trivial RDFa extremely hard to read and write. My workflow involves running it through a parser and then serializing it to Turtle to see if the graph is as intended. I'm just really slow with it. I may just not have the right tools for the job, or my mind may just be too wired to Turtle.

Will my commitment to meet test suite's needs in the spec help to resolve your issue?

Well, I just have to take your word for it, but I am prepared to do that if we have consensus.

@acoburn

consensus on whether it is a goal to have the resulting documents be machine readable in a semantically meaningful way

I was under the impression that we have achieved that. At least I hope so :)

So, indeed, I think that it is a good thing to have the spec written so that it can be read and tests generated and run without human interference. It will add a single stop for a basic compliance with little duplication. I'm certainly for it if we are able to.

kjetilk commented 3 years ago

It seems like this is pretty much settled with the adopted practice. Close?

elf-pavlik commented 3 years ago

Authorization, Authentication and Interop panels are using bikeshed. I think only this repo is using plain HTML. I understand that each spec editors simply choose tooling that fits them.