Open reschke opened 3 years ago
The namespaces for the types need to be the same, and we should promote content-type (media-type-name plus possibly parameters) for formats that have them.
More on artwork vs. source-code in https://mailarchive.ietf.org/arch/msg/rfc-markdown/-TMLuKxSopzSbB3z90ipkPE4AKU
... and we should promote content-type (media-type-name plus possibly parameters) for formats that have them...
:-) ... restoring what RFC 7749 said about it (https://greenbytes.de/tech/webdav/rfc7749.html#element.artwork.attribute.type)
The namespaces for the types need to be the same, and we should promote content-type (media-type-name plus possibly parameters) for formats that have them.
The type
attribute of the two elements have completely different meaning, and the permitted values for <artwork>
is very constrained (because of the meaning of the attribute, and how it affects the code to deal with the artwork), while the acceptable type
values for <sourcecode>
aren't constrained at all by tooling, and should not be. I don't see how it makes sense to force them into the same namespace.
because of the meaning of the attribute, and how it affects the code to deal with the artwork
Could you please elaborate a bit on that? (this needs to be understood and discussed...)
FWIW, as long as there's a gray area whether to use \
When you say same namespace, does that mean that the types for the two are different but the names can't overlap, or that you can use any defined type for either? The former seems obvious, the second unworkable.
Mainly the former. Given the fact that the distinction between the uses is not totally clear (see above), overlapping names for different things would be bad. That does not imply that any given type needs to be allowed in both elements.
(this is similar to the names of HTTP transfer codings and content codings, see https://greenbytes.de/tech/webdav/rfc7231.html#rfc.section.8.4.1.p.2)
It's interesing to compare the initial list of types (https://greenbytes.de/tech/webdav/rfc7991.html#element.sourcecode.attribute.type) with https://www.rfc-editor.org/materials/sourcecode-types.txt.
Observations:
Survey of types used by the RFC Production Center: https://gist.github.com/reschke/28318b8499746d211d9cfcfed4149af1
Some of the uses, notably with empty type attribute, are really scary. For instance: https://www.rfc-editor.org/rfc/rfc8783.html#section-3.4, where the sourcecode element essentially carries a definition list.
On 2021-02-19, at 17:12, Julian Reschke notifications@github.com wrote:
• there is zero documentation of what a type name is referring to • there seem to be entries where the use of
appears rather far-fetched ("http-message"? "test-vectors”?)
Again, the sole difference between the elements is whether the content displayed is for human consumption only or for machine consumption. This is intent, somewhat orthogonal to the type of the content.
Clearly, there is an exception for “svg”, which is text (or element content!?) for machine consumption by the tooling, not by the machines set up by the user of the RFC.
test-vectors is used in RFC 8696, RFC 8734, RFC 8891. This appears to be a weird mixture of hexdumps, JSON-like code, and PEM. It would need type information to enable machine processing.
hex-dump (which would be type information) was only ever used for base64 content (in RFC 8688) before we started using it (for annotated hex dumps) in RFC 8949. These are machine-processable, but mainly intended for human consumption, so at the time we opted for artwork.
I don’t think we can claim there is a system to this yet.
Grüße, Carsten
Again, the sole difference between the elements is whether the content displayed is for human consumption only or for machine consumption.
That's your take, not backed by the spec.
Another problem is that this implies that it's always clear what is for "machine consumption". You said yourself that HTTP message examples are not, yet there is scripting that validates things labeled "http-message".
On 2021-02-19, at 18:01, Julian Reschke notifications@github.com wrote:
Again, the sole difference between the elements is whether the content displayed is for human consumption only or for machine consumption.
That's your take, not backed by the spec.
Well, I’m trying to apply common meanings (e.g., [1]) to the otherwise undefined words used in the spec.
Another problem is that this implies that it's always clear what is for "machine consumption". You said yourself that HTTP message examples are not, yet there is scripting that validates things labeled "http-message".
I actually have had scripts that validate (or produce!) the English text in the sections. Machine usage in the production process is not what I meant (the whole XML file is source code!).
If there is an intention that the user of the spec (as opposed to its author or people reviewing it in the adoption process) be able to perform a copy-paste (or a more fancy xpath extraction) and use the result as machine-readable input, it’s source code.
Grüße, Carsten
[1]: https://en.wikipedia.org/wiki/Source_code: In computing, source code is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source code. The source code is often transformed by an assembler or compiler into binary machine code that can be executed by the computer. The machine code might then be stored for execution at a later time. Alternatively, source code may be interpreted and thus immediately executed.
FWIW, RFC 8949 uses hex-dump, but on artwork, not sourcecode.
If there is an intention that the user of the spec (as opposed to its author or people reviewing it in the adoption process) be able to perform a copy-paste (or a more fancy xpath extraction) and use the result as machine-readable input, it’s source code.
And "pseudocode" falls into that category...?
On 2021-02-19, at 18:12, Julian Reschke notifications@github.com wrote:
If there is an intention that the user of the spec (as opposed to its author or people reviewing it in the adoption process) be able to perform a copy-paste (or a more fancy xpath extraction) and use the result as machine-readable input, it’s source code.
And "pseudocode" falls into that category...?
Well, anything with “pseudo” in its name has problems neatly falling into categories :-)
(I have written pseudocode before that is close enough to common programming languages that it is a small matter of massaging to make it machine processable. The intent may very well be to solve the hard problems of coding and leave the distracting programming language ceremony to the users. The code at the top of page 4 of RFC 7396 is almost, but not exactly, Python.)
Grüße, Carsten
And then there's the problem of mislabeled \
Or cases where name and type have been mixed up:
<sourcecode name="http-message" type=""><![CDATA[
HTTP/1.1 400 Bad Request
Content-Language: en-US
Content-Type: application/json
{
"err": "invalid_key",
"description": "Key ID 12345 has been revoked."
}
]]></sourcecode>
@reschke what's that latter example from?
https://www.rfc-editor.org/rfc/rfc8935.html#section-2.3
Looking at that, another issue comes to mind: people indent sourcecode with leading whitespace, but for some "languages", that makes the "code" actually incorrect (like here).
...and they could have just used RFC7807...
On 24. Feb 2021, at 06:10, Julian Reschke notifications@github.com wrote:
https://www.rfc-editor.org/rfc/rfc8935.html#section-2.3 https://www.rfc-editor.org/rfc/rfc8935.html#section-2.3 Looking at that, another issue comes to mind: people indent sourcecode with leading whitespace, but for some "languages", that makes the "code" actually incorrect (like here).
Care to elucidate? (A.k.a., I don’t get it.)
Of course, there is the RFC 7386/7396 disaster to remind us that indentation must be preserved properly. (There never should be an ASCII HT, “TAB”, in an RFC.)
The bap tool has some interesting mandates on leading whitespace where I’m not sure how they are rooted in RF`C 5234. But wholesale indentation of an entire ABNF spec is not a problem with bap.
Grüße, Carsten
Leading whitespace is forbidden in HTTP/1.1 messages (request line, status line).
In ABNF, consistent leading whitespace is tolerated by BAP, but (AFAIR) not allowed by RFC 5234.
On 24. Feb 2021, at 08:47, Julian Reschke notifications@github.com wrote:
Leading whitespace is forbidden in HTTP/1.1 messages (request line, status line)
I would have expected readers to be able to abstract that out. (The wall to the left of it is not part of the painting in my living room either.)
Grüße, Carsten
Readers yes, tools not necessarily (without tinkering).
This brings us back the the question of whether
On 2021-02-24, at 09:58, Julian Reschke notifications@github.com wrote:
Readers yes, tools not necessarily (without tinkering).
The tools should now work with the XML and get the real sourcecode, not the rendered one. Solved…
(At least partially, @markers is probably not all processing advice that we’ll need.)
Grüße, Carsten
Hm?
Even if you extract the sourcecode from the XML, if it has leading whitespace, and the language does not allow it, processing will fail.
It would be great if we could settle the question of what the actual difference between artwork
and sourcecode
is; I have specs to ship :)
7991 is actually pretty clear:
[sourcecode] is thus useful for source code and formal languages (such as ABNF [RFC5234] or the RNC notation used in this document). (When
is a child of other elements, it flows with the text that surrounds it.) Tab characters (U+0009) inside of this element are prohibited. For artwork such as character-based art, diagrams of message layouts, and so on, use the
element instead.
That seems to support the RPC's seeming preference for sourcecode
over artwork
for not only computer languages and ABNF, but anything with a formal, structured syntax (including HTTP messages; we chose http-message
IIRC because it was inappropriate to use message/http
to denote a partial message, such as a single header field). In this view, artwork
is only suitable for things that are free-form and unstructured, like drawings and diagrams.
I'd be more comfortable if I know how they were practically different -- e.g., are they displayed differently? Does some other software treat them differently? Still, I think this issue could be closed without action, or at most with some editorial work adding more context to sourcecode
and artwork
about appropriate use.
Well, right now the RPC seems to prefer \
As for differences one might argue that the "keep on single page" requirement probably should be stronger for artwork?
(I agree that this is mostly editorial except maybe for the type attribute issue)
On 2021-03-04, at 07:00, Mark Nottingham notifications@github.com wrote:
That seems to support the RPC's seeming preference for sourcecode over artwork for not only computer languages and ABNF, but anything with a formal, structured syntax (including HTTP messages; we chose http-message IIRC because it was inappropriate to use message/http to denote a partial message, such as a single header field). In this view, artwork is only suitable for things that are free-form and unstructured, like drawings and diagrams.
I actually prefer to use \
I have cooked up types such as cddl;bad for this purpose; would be good to have a convention.
Grüße, Carsten
Well, right now the RPC seems to prefer
even in other cases, see for instance #195 (comment) - so clarification is needed in any case.
I talked with RPC folks about this, and their understanding had been that <artwork>
is for something that will requires visual presentation, such as a diagram or old-fashioned ASCII art. Everything else is <sourcecode>
even if it is not machine-readable or even actual code.
And yes this needs to be clearly documented, communicated, and "enforced", although the distinction is already drawn in RFC 7991 (when <sourcecode>
was introduced).
In addition, AIUI the type
attribute on <sourcecode>
is nice but not necessary, and really no more than a hint. The RPC has added more values for the type
attribute over time, as documented at https://www.rfc-editor.org/materials/sourcecode-types.txt - but this list is not intended to be exhaustive.
The point of the sourcecode tag list is mostly to be sure that the tag names are used consistently.
We'd need a lot more than tagging for it to be generally useful to extract sourcecode from the XML. It'd need to say what order to paste pieces together if it mattered, what version of the language (python2 vs python3) and a lot more. I don't think that is worth the effort. A few kinds of automatic extraction are useful, notably ABNF where the order doesn't matter, but that doesn't generalize.
I talked with RPC folks about this, and their understanding had been that
is for something that will requires visual presentation, such as a diagram or old-fashioned ASCII art. Everything else is even if it is not machine-readable or even actual code.
Good to know - but that is not backed by the spec, right?
Well, 7991 says what it says. Sourcecode is "This element allows the inclusion of source code into the document" and artwork is "For artwork such as character-based art, diagrams of message layouts,
and so on, use the
Because there might be a need for further clarification in the spec, I'll add a will-document label to this issue.
FYI, what we did in RFC8990:
The CBOR diagnostic code that is intended for exposition, but not for direct use in extraction, goes into:
<artwork name="grasp-examples.txt" align="left" ...>
The CDDL that is intended for exposition goes into:
<sourcecode type="cddl" name="grasp-fragments.cddl" markers="false" ...>
The CDDL that is also intended for extraction goes into:
<sourcecode name="grasp.cddl" type="cddl" markers="true" ...>
Note that the <CODE BEGINS>
for the latter is visible in the renderings, but not in the result of
xpath rfc8990.xml "//sourcecode[@name='grasp.cddl']/text()"
So, for extraction, we have essentially replaced the use of type= (which can still be used for a source code highlighter, except for the artwork cases where that would need to do further inference) with name=. @type='cddl' is still useful for extracting all CDDL and doing a consistency check between the exposition material and the normative CDDL specification.
I'm not sure I can derive more general rules from this result, but it points in a direction that is both compatible with the letter of certain current XML specification documents and still provides some function.
It would be good to have more guidance about the distinction of these, and what categories they apply to.
(Currently their formatting is roughly the same (and I believe we may want to relax the try-to-keep-on-one-page rule for \ as opposed to \.)
In particular, what element should be used for...:
Note that both elements have a "type" attribute. Are the space of values for these really separate? What would it mean for "x" being defined both as an artwork and sourcecode type?