CODE BEGINS in text output

hassanakbar4 / tractive-test

0 stars 0 forks source link

CODE BEGINS in text output #367

Closed hassanakbar4 closed 2 years ago

hassanakbar4 commented 6 years ago

component_Version_3_cli_txt resolution_fixed type_defect | by miek@miek.nl

https://gist.githubusercontent.com/miekg/0801d3b6aed86a8e1f7bad60bff7f1ae/raw/8d5562afb2223cb94d6b1b4858cc0be4bebc7690/learninggo.txt

Shows a whole lot of '' and '', not sure why this shows up in the text output:


   In the Go tutorial, you get started with Go in the typical manner:
   printing "Hello World" (Ken Thompson and Dennis Ritchie started this
   when they presented the C language in the 1970s).  That's a great way
   to start, so here it is, "Hello World" in Go.

   <CODE BEGINS>
   package main <1>

   import "fmt" <2> // Implements formatted I/O.

   /* Print something */ <3>
   func main() {         <4>
           fmt.Printf("Hello, world.") <5>
   }
   <CODE ENDS>

Issue migrated from trac:367 at 2021-10-20 18:28:47 +0500


    
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

This is generated if you use .  If you don't want those, use  instead.
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented

Why does it include it? Is there anything in the specs that says it should?
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

Not in RFC 7991.  The background is that  and  informally started to be used to mark code in drafts a number of years ago.  That was followed by this statement from the IETF Trust:

https://trustee.ietf.org/license-info/IETF-TLP-1.htm
and later codified more for YANG documents in 
https://tools.ietf.org/html/rfc6087#section-3.1
It seems to make sense to support this in 7991 , to encourage consistency.
            

        

            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented

That would need to be discussed.
My understanding is that the markers are only to be used if the language is not on a white list (see https://trustee.ietf.org/license-info/Code-Components-List-4-23-09.txt)
See also related discussion in https://github.com/rfc-format/draft-iab-xml2rfc-v3-bis/issues/8 (raised 20 months ago).
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} changed _comment0 which not transferred by tractive
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

Replying to hassanakbar4/tractive-test#367 (comment:4):

That would need to be discussed.

Ack.

My understanding is that the markers are only to be used if the language is not on a white list (see https://trustee.ietf.org/license-info/Code-Components-List-4-23-09.txt)

Umm.  That doesn't make sense to me, given that one of the reasons I've seen for standardizing this is to be able to write tools that consistently extract code components from drafts and RFCs.  Several YANG tools uses this already in order to extract YANG modules from documents.

See also related discussion in https://github.com/rfc-format/draft-iab-xml2rfc-v3-bis/issues/8 (raised 20 months ago).

Ack.
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented

The reason for the code tags wasn't extraction, but copyright/licensing.
If you want to automate extraction, the answer is to process the XML, not the TXT output.
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

Replying to hassanakbar4/tractive-test#367 (comment:6):

The reason for the code tags wasn't extraction, but copyright/licensing.

No, that's simply not correct.  I was part of the discussions that lead to the TLP wording, and pointed at the already established usage of  for extraction.


If you want to automate extraction, the answer is to process the XML, not the TXT output.

Good answer if xml is available.  Not so good otherwise.
            

        

            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed _comment0 which not transferred by tractive
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented


No, that's simply not correct. I was part of the discussions that lead to the TLP wording, and pointed at the already established usage of  for extraction.


Interesting - I wasn't aware of that.

Good answer if xml is available. Not so good otherwise.

I thought the implementation of the v3 processor was part of the move to XML as canonical format? Also, what's the point to implement this in this processor, because, by definition, it runs on XML input?
            

        

            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

Replying to hassanakbar4/tractive-test#367 (comment:8):


No, that's simply not correct. I was part of the discussions that lead to the TLP wording, and pointed at the already established usage of  for extraction.


Interesting - I wasn't aware of that.

Good answer if xml is available. Not so good otherwise.

I thought the implementation of the v3 processor was part of the move to XML as canonical format? Also, what's the point to implement this in this processor, because, by definition, it runs on XML input?

As an example, let's look at the YANG toolchain.  Both https://yangvalidator.org/ and https://yangcatalog.org/ routinely take in drafts in development, extract YANG modules, and process them (validation, meta-data extraction, dependency tree computation, and more).  In the end, documents may end up as RFCs, but the bulk of the validation etc. is done during the draft development.  The draft text, using the specific formatting defined in RFC 6087, is the common basis of this, no matter if the draft was created using xml2rfc or through some other means.  Given that we have a clearly defined way of delimiting code chunks in drafts and RFCs, what reason could we possibly have for not letting the xml2rfc formatter produce correct delimitation and chunk filename according to RFC 6087?  Forcing people to add the  text manually for text output doesn't make sense to me.

Now, I'm open to doing things differently for formats that can carry metainformation.  In RFC 7992, the rendering of  specifies that 
 should be used -- this would be sufficient for tools to reliably extract code from HTML rendered documents.
If xml is available, (as it will consistently be for RFCs when the RFC-Editor has transitioned to XML as the archival format) the extraction is a no-brainer, exactly as you describe.
But I don't see why we should use that as a reason not to make life easy for people writing drafts where the code might be extracted from the text document for validation or other processing.
            

        

            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} changed _comment0 which not transferred by tractive
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented

I have no problem with an option to include the code markers (*). Inserting them however when they aren't needed doesn't make sense to me.
FWIW: (*) https://greenbytes.de/tech/webdav/rfc2629xslt/rfc2629xslt.html#ext-rfc2629.artwork
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

Ok, so what about this: give <sourcecode> an attribute code-markers with possible values "default" | "true" | "false" where default would insert <CODE BEGINS> etc. in formats without any meta-information carrying capability, such as the plaintext format, but not otherwise?
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented

I think an attribute is what we need indeed.
I'm unhappy with new features that will lead to text content and HTML content to differ in anything other than presentation.
The default behavior really should be not to insert the code markers. Let the author override it when they really really need it.
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

I'm ok with consistently inserting markers in all formats, or not, depending on attribute settings.
I think your view of what should be the default is coloured by what you would personally use, but I also believe what makes life easiest for most draft authors wanting to insert code in drafts is to have as default to insert the markers.
If this was a general markup language, I would not suggest this.  But as it's geared towards IETF documents where we want people to insert code markers, we should make that happen by default.
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented

Actually, we only want people to insert code markers when they are needed.
As far as I can tell, they are only needed for code fragments not covered by the TLP, and for YANG modules. Unless I'm missing something that's an exception, not the rule, right?
(In any case, this discussion probably should take place in https://github.com/rfc-format/draft-iab-xml2rfc-v3-bis/issues/8)
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

Replying to hassanakbar4/tractive-test#367 (comment:14):

Actually, we only want people to insert code markers when they are needed.

I don't agree with this statement.  On what is it based?

As far as I can tell, they are only needed for code fragments not covered by the TLP, and for YANG modules. Unless I'm missing something that's an exception, not the rule, right?

The IESG and others have expressed a desire to add validation for other code fragments than Yang in the datatracker, basically any formal languages we routinely use, such as ABNF; not only YANG.

(In any case, this discussion probably should take place in https://github.com/rfc-format/draft-iab-xml2rfc-v3-bis/issues/8)

I guess, yes.
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented


I don't agree with this statement. On what is it based?

On the theory that bloat that affects the readability should be avoided.

The IESG and others have expressed a desire to add validation for other code fragments than Yang in the datatracker, basically any formal languages we routinely use, such as ABNF; not only YANG.

We're talking about the future, right? I just don't see how extraction from plain text is of relevance, when, by definition, XML is available (If it wasn't, you wouldn't have anything to run the formatter on...)
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"ietf@augustcellars.com", "name"=>nil, "username"=>nil} commented

I very much agree that adding this in for text is generally not what I want.  I would question even more if this behavior is desired for the HTML format as I would expect that this would be highlighted by color and not by text.  If there is going to be a default behavior, then it should perhaps depend on the language involved rather than being universal across all languages.  The RFC Editor in consultation w/ the powers that be can then set a default for different languages.
The idea of moving things from sourcecode to artwork would only work if you were to move the 'type' attribute as well.  If you don't do that then extractors based on XML would no longer be successful.  However the whole concept was to separate artwork from code so that does not make any sense. 
From the RFCs that I have normally looked at, the frequency of using the text tagging at the start and end has been zero.  I realize that there are cases where this is very common but it is not by any means universal.
            
        
            
            
                hassanakbar4
                commented
                 5 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

Replying to hassanakbar4/tractive-test#367 (comment:16):

We're talking about the future, right? I just don't see how extraction from plain text is of relevance, when, by definition, XML is available (If it wasn't, you wouldn't have anything to run the formatter on...)

I think you miss part of the point here:

Not all drafts are being submitted with XML
The datatracker and toolschains to validate code need to work on all drafts
Toolchains have been built and are being built to extract from all drafts using code markers
To make it easy for people to produce compliant drafts with this tool, we should make it easy to provide the code markers.

In an ideal future where all drafts were produced from xml2rfc v3, the extraction toolchains would not need to work on text documents.  Till then, building one toolchain that works on all drafts wins over building two, especially when there's not enough hours to do anywhere close to all the work items queued for the datatracker.
And by the way, not nearly all drafts produced from xml source is submitted with xml source.  Out of 11914 drafts from the last 365 days, there were 7137 .xml files.  I believe the percentage actually produced from .xml sources is much higher than 60% (although I'm gratified to find that we've reached 60% .xml availability).
Anyway, I think we agree on having an attribute to control rendering of code markers; the only question is what would be the default setting.
            
        
            
            
                hassanakbar4
                commented
                 4 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed status from new to closed
            
        
            
            
                hassanakbar4
                commented
                 4 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed resolution from ` tofixed`
            
        
            
            
                hassanakbar4
                commented
                 4 years ago            
            
                @{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented

The v3 vocabulary now has  to generate markers, with the default being "false".
            
        
    
    
            

    
        
            ©  Githubissues.
            Githubissues is a development platform for aggregating issues.