judepayne / dictim

a two-way d2 transpiler, written in Clojure
MIT License
12 stars 2 forks source link

Accommodate different line return character sequence on Windows #10

Closed judepayne closed 2 weeks ago

judepayne commented 2 weeks ago

On Mac and Linux, line return is \n, whereas on Windows its \r\n

Dictim is not correctly accounting for Windows, and this is leading to a number of tests failing in the d2/parse-test namespace.

e.g.

FAIL in (classes) (parse_test.clj:290)
Software Classes
expected: (= dict (quote (["MyClass" {"shape" "class"} ["field" ""[]string""] ["method(a uint64)" "(x, y int)"]] ["D2 Parser" {"shape" "class"} [:comment "Default visibility is + so no need to specify."] ["+reader" "io.RuneReader"] ["readerPos" "d2ast.Position"] [:comment "Private field."] ["-lookahead" ""[]rune""] [:comment "Protected field."] [:comment "We have to escape the # to prevent the line from being parsed as a comment."] ["\#lookaheadPos" "d2ast.Position"] ["+peek()" "(r rune, eof bool)"] ["rewind()"] ["commit()"] ["\#peekn(n int)" "(s string, eof bool)"]] [""github.com/terrastruct/d2parser.git"" "--" "D2 Parser"])))
actual: (not (= (["MyClass" {"shape" "class\r"} ["field" ""[]string""] ["method(a uint64)" "(x, y int)"]] ["D2 Parser" {"shape" "class\r"} [:comment "Default visibility is + so no need to specify.\r"] ["+reader" "io.RuneReader"] ["readerPos" "d2ast.Position"] [:comment "Private field.\r"] ["-lookahead" ""[]rune""] [:comment "Protected field.\r"] [:comment "We have to escape the # to prevent the line from being parsed as a comment.\r"] ["\#lookaheadPos" "d2ast.Position"] ["+peek()" "(r rune, eof bool)"] ["rewind()"] ["commit()"] ["\#peekn(n int)" "(s string, eof bool)"]] [""github.com/terrastruct/d2parser.git"" "--" "D2 Parser"]) (["MyClass" {"shape" "class"} ["field" ""[]string""] ["method(a uint64)" "(x, y int)"]] ["D2 Parser" {"shape" "class"} [:comment "Default visibility is + so no need to specify."] ["+reader" "io.RuneReader"] ["readerPos" "d2ast.Position"] [:comment "Private field."] ["-lookahead" ""[]rune""] [:comment "Protected field."] [:comment "We have to escape the # to prevent the line from being parsed as a comment."] ["\#lookaheadPos" "d2ast.Position"] ["+peek()" "(r rune, eof bool)"] ["rewind()"] ["commit()"] ["\#peekn(n int)" "(s string, eof bool)"]] [""github.com/terrastruct/d2parser.git"" "--" "D2 Parser"])))
judepayne commented 2 weeks ago

ping @krish777 @eddielao

judepayne commented 2 weeks ago

ping @krish777 @eddielao Eddie, please re-clone the project and re-run clj -X:test and report any errors that remain on Windows. thx!

judepayne commented 2 weeks ago

btw, hope this works :) The parse part of dictim is not for the faint hearted, being based on this EBNF grammar which I had to reverse engineer from the d2 docs. I try to avoid changing it!

["(* high level structure *)"
 "    <D2> = elements"
 "    elements = break* (element break)* element? | break* | element"
 "    <contained> = (element break)* element? | break* | element"
 "    <element> = list | elem"
 ""
 "    <elem> = classes | vars | ctr | attr | comment | conn | conn-ref"
 ""
 "    (* lists and comments *)"
 "    list = (elem <semi>+)+ elem <semi>*"
 "    comment = <s> <hash> lbl"
 ""
 "    (* containers - including shapes *)"
 "    ctr = (<s> ctr-key colon-label? (<curlyo> <break?> contained <s> <curlyc>)?) | composite-ctr"
 "    <composite-ctr> = <s> ctr-key <period> composite-ctr-attr"
 "    composite-ctr-attr = std-attr      "
 "    ctr-key = !classes-lit !hash !vars-lit (ctr-key-part <period>)* ctr-key-part"
 "    ctr-key-part =  !d2-keyword #'((?!(?:<-|->|<->|--))[^.;:\\r\\n{}*&<>])+'"
 ""
 "    (* vars, classes and attrs *)"
 "    (* vars *)"
 "    vars = <s> vars-lit <s> <colon> <s> vars-content"
 "    <vars-content> = <curlyo> <break?> the-vars <s> <break>? <s> <curlyc>"
 "    the-vars = (var <break>)* var"
 "    var = <s> ctr-key <colon> <s> (var-val | vars-content)"
 "    var-val = #'[^*{};\\r\\n]+' | inner-list"
 ""
 "    (* classes *)"
 "    classes = <s> classes-lit <s> <colon> <s> <curlyo> <at-sep*>"
 "              (class <at-sep*>)* <curlyc>"
 "    class = <s> ctr-key <s> <colon> <s> attrs"
 ""
 "    (* attrs *)"
 "    <at-sep> = break | semi"
 "    attr = std-attr | commented-attr"
 "    <std-attr> = (<s> attr-key <s> <colon> <s> (attr-val | attr-label? attrs))"
 "    commented-attr = <s> <hash> std-attr"
 "    attrs = <curlyo> <at-sep*> (attr <at-sep+>)* attr <at-sep*> <s> <curlyc>"
 "    attr-key = ((attr-key-part <period>)* (attr-key-last <period>)* attr-key-last) | globs"
 "    attr-key-part = &glob #'((?!(?:vars))[^.;:\\r\\n{}()])+'"
 "    attr-key-last = d2-keyword | amp d2-keyword"
 "    attr-label = label (* i.e. lbl, block or typescript *)"
 "    <item> = #'[^;\\[\\]\\r\\n]+'"
 "    inner-list = <'['> (item <semi> <s>)* item <']'>"
 "    <av> = !null #'[^*{};\\r\\n]+'"
 "    attr-val = av | substitution | <s> null <s> | inner-list"
 ""
 "    (* labels *)"
 "    <colon-label> = (<colon> <s> | <colon> <s> label)"
 "    label = lbl | block | typescript"
 "    <lbl> = (<s> null <s>) | normal-label | substitution"
 "    <normal-label> = !null #'[^;|{}\\r\\n]+'"
 "    <substitution> =  <s> #'^(?!^\\s*$)([^;${\r"
 "]*\\$\\{[^}]+\\})+[^{}\\r\\n;|]*'"
 "    block = <s> pipe #'[^|]+' pipe <s>"
 "    typescript = <s> ts-open #'[\\s\\S]+?(?=\\|\\|\\||`\\|)' ts-close <s>"
 ""
 "    (* connections *)"
 "    conn = <s> (conn-key dir)+ <s> conn-key colon-label? attrs?"
 "    conn-key = (conn-key-part <period>)* conn-key-part"
 "    conn-key-part = !d2-keyword #'((?!(?:<-|->|<->|--))[^.;:\\r\\n{}<>&()])+' (* greedy regex *)"
 "    dir = <contd?> <s> direction"
 "    contd = #'--\\\\"
 "' | #'--\\\\\r\\n'"
 "    <direction> = '--' | '->' | '<-' | '<->'"
 ""
 "    (* conn-refs - a special type of connection *)"
 "    conn-ref = <s> conn-ref-key conn-ref-val"
 "    conn-ref-key = <'('> <s> crk <s> dir <s> crk <s> <')'> <'['> array-val <']'>"
 "    conn-ref-val = (conn-ref-attr-keys <s> <colon> <s> (attr-val | attrs) | <s> <colon> <s> null)"
 "    crk = #'((?!(?:<-|->|<->|--))[^.;:\\r\\n{}<>&()()])+'"
 "    conn-ref-attr-keys = (<period> d2-keyword)+"
 "    array-val = #'\\d' | globs"
 ""
 "    (* building blocks *)"
 "    <any> = #'.'"
 "    <d2-keyword> =('font-color'|'width'|'height'|'shape'|'class'|'grid-rows'|'tooltip'|'label'|'text-transform'|'stroke-width'|'vertical-gap'|'stroke-dash'|'shadow'|'3d'|'stroke'|'style'|'grid-columns'|'grid-gap'|'direction'|'fill'|'horizontal-gap'|'filled'|'font'|'animated'|'link'|'double-border'|'font-size'|'italic'|'fill-pattern'|'border-radius'|'source-arrowhead'|'target-arrowhead'|'comment'|'bold'|'underline'|'opacity'|'near'|'constraint'|'multiple'|'icon')"
 "    s = #' *'"
 "    globs = glob+"
 "    break = lr+"
 "    lr = #'[^\\S\\r\\n]*\\r?\\n'"
 ""
 "    (* literals *)"
 "    <single-quote> = '\\''   (* can't insert due to clojure/ instaparse escaping diffs *)"
 "    <double-quote> = '\\\"'  (* ditto *)"
 "    <hyphen> = '-'"
 "    colon = ':'"
 "    <hash> = '#'"
 "    curlyc = '}'"
 "    null = 'null'"
 "    <r-arrow> = '>'"
 "    ts-open = '|||' | '|`'"
 "    <l-arrow> = '<'"
 "    vars-lit = 'vars'"
 "    pipe = '|'"
 "    classes-lit = 'classes'"
 "    <amp> = '&'"
 "    bracketc = ')'"
 "    ts-close = '|||' | '`|'"
 "    bracketo = '('"
 "    <period> = '.'"
 "    curlyo = '{'"
 "    <glob> = '*'"
 "    <semi> = ';'"
 "    "]
eddielao commented 2 weeks ago

btw, hope this works :) The parse part of dictim is not for the faint hearted, being based on this EBNF grammar which I had to reverse engineer from the d2 docs. I try to avoid changing it!

["(* high level structure *)"
 "    <D2> = elements"
 "    elements = break* (element break)* element? | break* | element"
 "    <contained> = (element break)* element? | break* | element"
 "    <element> = list | elem"
 ""
 "    <elem> = classes | vars | ctr | attr | comment | conn | conn-ref"
 ""
 "    (* lists and comments *)"
 "    list = (elem <semi>+)+ elem <semi>*"
 "    comment = <s> <hash> lbl"
 ""
 "    (* containers - including shapes *)"
 "    ctr = (<s> ctr-key colon-label? (<curlyo> <break?> contained <s> <curlyc>)?) | composite-ctr"
 "    <composite-ctr> = <s> ctr-key <period> composite-ctr-attr"
 "    composite-ctr-attr = std-attr      "
 "    ctr-key = !classes-lit !hash !vars-lit (ctr-key-part <period>)* ctr-key-part"
 "    ctr-key-part =  !d2-keyword #'((?!(?:<-|->|<->|--))[^.;:\\r\\n{}*&<>])+'"
 ""
 "    (* vars, classes and attrs *)"
 "    (* vars *)"
 "    vars = <s> vars-lit <s> <colon> <s> vars-content"
 "    <vars-content> = <curlyo> <break?> the-vars <s> <break>? <s> <curlyc>"
 "    the-vars = (var <break>)* var"
 "    var = <s> ctr-key <colon> <s> (var-val | vars-content)"
 "    var-val = #'[^*{};\\r\\n]+' | inner-list"
 ""
 "    (* classes *)"
 "    classes = <s> classes-lit <s> <colon> <s> <curlyo> <at-sep*>"
 "              (class <at-sep*>)* <curlyc>"
 "    class = <s> ctr-key <s> <colon> <s> attrs"
 ""
 "    (* attrs *)"
 "    <at-sep> = break | semi"
 "    attr = std-attr | commented-attr"
 "    <std-attr> = (<s> attr-key <s> <colon> <s> (attr-val | attr-label? attrs))"
 "    commented-attr = <s> <hash> std-attr"
 "    attrs = <curlyo> <at-sep*> (attr <at-sep+>)* attr <at-sep*> <s> <curlyc>"
 "    attr-key = ((attr-key-part <period>)* (attr-key-last <period>)* attr-key-last) | globs"
 "    attr-key-part = &glob #'((?!(?:vars))[^.;:\\r\\n{}()])+'"
 "    attr-key-last = d2-keyword | amp d2-keyword"
 "    attr-label = label (* i.e. lbl, block or typescript *)"
 "    <item> = #'[^;\\[\\]\\r\\n]+'"
 "    inner-list = <'['> (item <semi> <s>)* item <']'>"
 "    <av> = !null #'[^*{};\\r\\n]+'"
 "    attr-val = av | substitution | <s> null <s> | inner-list"
 ""
 "    (* labels *)"
 "    <colon-label> = (<colon> <s> | <colon> <s> label)"
 "    label = lbl | block | typescript"
 "    <lbl> = (<s> null <s>) | normal-label | substitution"
 "    <normal-label> = !null #'[^;|{}\\r\\n]+'"
 "    <substitution> =  <s> #'^(?!^\\s*$)([^;${\r"
 "]*\\$\\{[^}]+\\})+[^{}\\r\\n;|]*'"
 "    block = <s> pipe #'[^|]+' pipe <s>"
 "    typescript = <s> ts-open #'[\\s\\S]+?(?=\\|\\|\\||`\\|)' ts-close <s>"
 ""
 "    (* connections *)"
 "    conn = <s> (conn-key dir)+ <s> conn-key colon-label? attrs?"
 "    conn-key = (conn-key-part <period>)* conn-key-part"
 "    conn-key-part = !d2-keyword #'((?!(?:<-|->|<->|--))[^.;:\\r\\n{}<>&()])+' (* greedy regex *)"
 "    dir = <contd?> <s> direction"
 "    contd = #'--\\\\"
 "' | #'--\\\\\r\\n'"
 "    <direction> = '--' | '->' | '<-' | '<->'"
 ""
 "    (* conn-refs - a special type of connection *)"
 "    conn-ref = <s> conn-ref-key conn-ref-val"
 "    conn-ref-key = <'('> <s> crk <s> dir <s> crk <s> <')'> <'['> array-val <']'>"
 "    conn-ref-val = (conn-ref-attr-keys <s> <colon> <s> (attr-val | attrs) | <s> <colon> <s> null)"
 "    crk = #'((?!(?:<-|->|<->|--))[^.;:\\r\\n{}<>&()()])+'"
 "    conn-ref-attr-keys = (<period> d2-keyword)+"
 "    array-val = #'\\d' | globs"
 ""
 "    (* building blocks *)"
 "    <any> = #'.'"
 "    <d2-keyword> =('font-color'|'width'|'height'|'shape'|'class'|'grid-rows'|'tooltip'|'label'|'text-transform'|'stroke-width'|'vertical-gap'|'stroke-dash'|'shadow'|'3d'|'stroke'|'style'|'grid-columns'|'grid-gap'|'direction'|'fill'|'horizontal-gap'|'filled'|'font'|'animated'|'link'|'double-border'|'font-size'|'italic'|'fill-pattern'|'border-radius'|'source-arrowhead'|'target-arrowhead'|'comment'|'bold'|'underline'|'opacity'|'near'|'constraint'|'multiple'|'icon')"
 "    s = #' *'"
 "    globs = glob+"
 "    break = lr+"
 "    lr = #'[^\\S\\r\\n]*\\r?\\n'"
 ""
 "    (* literals *)"
 "    <single-quote> = '\\''   (* can't insert due to clojure/ instaparse escaping diffs *)"
 "    <double-quote> = '\\\"'  (* ditto *)"
 "    <hyphen> = '-'"
 "    colon = ':'"
 "    <hash> = '#'"
 "    curlyc = '}'"
 "    null = 'null'"
 "    <r-arrow> = '>'"
 "    ts-open = '|||' | '|`'"
 "    <l-arrow> = '<'"
 "    vars-lit = 'vars'"
 "    pipe = '|'"
 "    classes-lit = 'classes'"
 "    <amp> = '&'"
 "    bracketc = ')'"
 "    ts-close = '|||' | '`|'"
 "    bracketo = '('"
 "    <period> = '.'"
 "    curlyo = '{'"
 "    <glob> = '*'"
 "    <semi> = ';'"
 "    "]

I will test it after I get home from office this evening. I was thinking to test it with ? after r like line 218, to make \r optional.

lr = #'[^\\S\\r\\n]*\\r?\\n'

So the same line of code can work for both Windows and Mac/Linux. Thank you @judepayne !

eddielao commented 2 weeks ago

Hi @judepayne ,

I made the following changes.

https://github.com/judepayne/dictim/pull/11

judepayne commented 2 weeks ago

Hi @eddielao I saw your PR, but can you please verify whether my fix above also fixes the issue with Windows line endings, then I can think about which is better

eddielao commented 2 weeks ago

Hi @eddielao I saw your PR, but can you please verify whether my fix above also fixes the issue with Windows line endings, then I can think about which is better

Hi @judepayne , I had to include your changes (my branch is with the latest main branch changes) because they fixed most of the failing tests except the one below.

Eddie@LZXTH1 MINGW64 ~/OneDrive/Dev/git/dictim (main) $ git log -2 commit f8b2ab18fccd5df06f1800ed4171f00b278a5ce8 (HEAD -> main, origin/main, origin/HEAD) Author: Jude Payne judepayne@yahoo.co.uk Date: Fri Aug 30 20:07:35 2024 +0100

untested fix for Windows line return

commit 93835345dc26ca9e25c5ee933fb80f4ad642ee65 Author: Jude Payne judepayne@yahoo.co.uk Date: Tue Aug 27 21:22:53 2024 +0100

tidy up of commented out attrs

Eddie@LZXTH1 MINGW64 ~/OneDrive/Dev/git/dictim (main) $ clj -X:test

Running tests in #{"test"}

Testing dictim.d2.compile-test

Testing dictim.d2.parse-test

FAIL in (textandcode) (parse_test.clj:254) Text and Code expected: (= dict (quote (["explanation" "|md\n # I can do headers\n - lists\n - lists\n\n And other normal markdown stuff\n|"] ["my_code" "|||ts\n declare function getSmallPet(): Fish | Bird;\n const works = (a > 1) || (b < 2)\n|||"] ["mycode" "|ts\n declare function getSmallPet(): Fish | Bird;\n const works = (a > 1) || (b < 2)\n|"] ["amscd plugin" ["ex" "|tex\n\\begin{CD} B @>{\\text{very long label}}>> C S^{{\\mathcal{W}}\\Lambda}\\otimes T @>j>> T\\\\ @VVV V \\end{CD}\n|"]] ["multilines" ["ex" "|tex\n\\displaylines{x = a + b \\\\ y = b + c}\n\\sum{k=1}^{n} h{k} \\int{0}^{1} \\bigl(\\partial{k} f(x{k-1}+t h{k} e{k}) -\\partial{k} f(a)\\bigr) \\,dt\n|"]] ["title" "A winning strategy" {"shape" "text", "near" "top-center", "style" {"font-size" 55, "italic" true}}] ["poll the people" "->" "results"] ["results" "->" "unfavorable" "->" "poll the people"] ["results" "->" "favorable" "->" "will of the people"]))) actual: (not (= (["explanation" "|md\r\n # I can do headers\r\n - lists\r\n - lists\r\n\r\n And other normal markdown stuff\r\n|"] ["my_code" "|||ts\r\n declare function getSmallPet(): Fish | Bird;\r\n const works = (a > 1) || (b < 2)\r\n|||"] ["mycode" "|ts\r\n declare function getSmallPet(): Fish | Bird;\r\n const works = (a > 1) || (b < 2)\r\n|"] ["amscd plugin" ["ex" "|tex\r\n\\begin{CD} B @>{\\text{very long label}}>> C S^{{\\mathcal{W}}\\Lambda}\\otimes T @>j>> T\\\\ @VVV V \\end{CD}\r\n|"]] ["multilines" ["ex" "|tex\r\n\\displaylines{x = a + b \\\\ y = b + c}\r\n\\sum{k=1}^{n} h{k} \\int{0}^{1} \\bigl(\\partial{k} f(x{k-1}+t h{k} e{k}) -\\partial{k} f(a)\\bigr) \\,dt\r\n|"]] ["title" "A winning strategy" {"shape" "text", "near" "top-center", "style" {"italic" true, "font-size" 55}}] ["poll the people" "->" "results"] ["results" "->" "unfavorable" "->" "poll the people"] ["results" "->" "favorable" "->" "will of the people"]) (["explanation" "|md\n # I can do headers\n - lists\n - lists\n\n And other normal markdown stuff\n|"] ["my_code" "|||ts\n declare function getSmallPet(): Fish | Bird;\n const works = (a > 1) || (b < 2)\n|||"] ["mycode" "|ts\n declare function getSmallPet(): Fish | Bird;\n const works = (a > 1) || (b < 2)\n|"] ["amscd plugin" ["ex" "|tex\n\\begin{CD} B @>{\\text{very long label}}>> C S^{{\\mathcal{W}}\\Lambda}\\otimes T @>j>> T\\\\ @VVV V \\end{CD}\n|"]] ["multilines" ["ex" "|tex\n\\displaylines{x = a + b \\\\ y = b + c}\n\\sum{k=1}^{n} h{k} \\int{0}^{1} \\bigl(\\partial{k} f(x{k-1}+t h{k} e{k}) -\\partial{k} f(a)\\bigr) \\,dt\n|"]] ["title" "A winning strategy" {"shape" "text", "near" "top-center", "style" {"font-size" 55, "italic" true}}] ["poll the people" "->" "results"] ["results" "->" "unfavorable" "->" "poll the people"] ["results" "->" "favorable" "->" "will of the people"])))

Testing dictim.flat-test

Testing dictim.graphspec-test

Testing dictim.json-test

Testing dictim.template-test

Ran 83 tests containing 361 assertions. 1 failures, 0 errors. Execution error (ExceptionInfo) at cognitect.test-runner.api/test (api.clj:30). Test failures or errors occurred.

Full report at: C:\Users\Eddie\AppData\Local\Temp\clojure-8551768610252755755.edn

Eddie@LZXTH1 MINGW64 ~/OneDrive/Dev/git/dictim (main)

judepayne commented 2 weeks ago

Thanks will look into.

judepayne commented 2 weeks ago

@eddielao Please re-clone and run the tests again + let me know.

In d2, label can be in markdown form when surrounded by |md and |. Inside those blocks, it's standard markdown, including \n or \r\n on Windows. dictim does not attempt to parse these blocks, taking them verbatim. That's why the above test was failing. That test loads the d2 in test/dictim/d2/samples/textandcode.d2 (and parses it). On windows that file must have the \n line endings replaced with \r\n.

Correct fix

During development of the dictim parser (say we need to add new features because d2 adds new features) we're working at the repl and usually calling the parse-d2 function in the dictim.d2.parse ns rather than the dictim function. That's why I was keen to fix the \r issues as much as possible in the parser (previous push) .. to support development on both Windows as well as other platforms. However we do need your normalize-line-endings fn where your PR put it - in the dictim function as well to capture this one case of markdown labels.

I will accept your PR. thank you.

eddielao commented 2 weeks ago

Thank you, @judepayne ! I still have a lot to learn and really appreciate your detailed explanation!!