jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.78k stars 3.39k forks source link

Typst: different typst output when writing to `.typ` file or to pdf with the typst engine #10320

Closed ZoomRmc closed 1 month ago

ZoomRmc commented 1 month ago

This one is kind of weird. I'm experiencing different typst output when writing to *.typ file or to pdf with --pdf-engine typst, the latter being broken.

After examining the temporary file produced, it seems like something is happening around the stage when the output is combined with the template file.

Here's steps to reproduce:

  1. RST input converted to a native representation, correctly (not counting the unsupported RST Option List).

    **A**: aaa
    
      - as
    
    **B**: bbb
      - bs
    
    **C**: ccc
      - cs
    
    D D
        ddd
    
    \-E E
      eee
    
    ..
      comment
    
    -F  F
        fff
  2. Native output converter to a typst file (pandoc test.native --from=native -o test.typ), correctly.

    #strong[A];: aaa
    
    #quote(block: true)[
    - as
    ]
    
    / #strong[B];\: bbb: #block[
    - bs
    ]
    
    / #strong[C];\: ccc: #block[
    - cs
    ]
    
    / D D: #block[
    ddd
    ]
    
    / -E E: #block[
    eee
    ]
    
    / -F F: #block[
    fff
    ]
  3. The typst file gets compiled to PDF twice:
    • with typst compile test.typ test-typst.pdf, producing expected results
    • with pandoc test.typ --pdf-engine=typst test-pandoc.pdf, producing broken output issue
  4. The main body of the document, as written to the temporary typst file generated (examined with the tool), contains mangled typst output, which explains the incorrect rendering. The main issue is that somehow the term marker becomes text.

    #strong[A];: aaa
    
    #quote(block: true)[
    
    - as
    
    ℄~
    ]
    
    \/ #strong[B];: bbb:
    
    - bs
    
    \/ #strong[C];: ccc:
    
    - cs
    
    \/ D D: ddd
    
    \/ -E E: eee
    
    \/ -F F: fff

It was pretty surprising to see typst code in the document body being different depending on being written directly or composed with the default template and used as a temporary file.

No external templates or filters were used.

Encountered with pandoc 3.2, reproduced with 3.5 & 3.5-nightly-2024-10-21

jgm commented 1 month ago

Are you on Windows? I wonder if there could be an encoding issue somewhere.

Have you tried --verbose? This will print out the intermediate typst file.

jgm commented 1 month ago

For what it's worth, I can't reproduce the issue on my system (macos). This suggests it's Windows related.

jgm commented 1 month ago

Note that when the typst document is written to the temp directory, it will be (a) UTF-8 encoded and (b) have LF rather than CRLF line endings. (b) accounts for the line ending issues you're seeing (since you're viewing this in a Windows editor) and (a) may account for the garbled text. However, I would have expected that typst would behave fine with this sort of input.

ZoomRmc commented 1 month ago

Reproducible on both Windows 10 and Linux machines. Does not depend on the LF/CRLF of the input .typ file.

EDIT: there's some difference:

jgm commented 1 month ago

OK, somehow I'd missed the fact that you're using pandoc to convert typst to typst (probably because this is an odd thing to do). You may not realize that you're doing this, but when you do

pandoc test.typ --pdf-engine=typst test-pandoc.pdf

pandoc will

(a) parse test.typ to a pandoc AST (b) then convert that AST back to a typst document (c) and finally compile it using typst

So this really doesn't have anything to do with PDF. You can reproduce the issue using

 % pandoc -f typst -t typst
#strong[A];: aaa

 #quote(block: true)[
 - as
 ]

 / #strong[B];\: bbb: #block[
 - bs
 ]

 / #strong[C];\: ccc: #block[
 - cs
 ]

 / D D: #block[
 ddd
 ]

 / -E E: #block[
 eee
 ]

 / -F F: #block[
 fff
 ]
^D
#strong[A];: aaa

#quote(block: true)[

- as

℄~
]

\/ #strong[B];: bbb:

- bs

\/ #strong[C];: ccc:

- cs

\/ D D: ddd

\/ -E E: eee

\/ -F F: fff
jgm commented 1 month ago

Apparently it's an issue with the typst reader, which parses the typst document to

[ Para [ Strong [ Str "A" ] , Str ":" , Space , Str "aaa" ]
, BlockQuote
    [ Para []
    , BulletList [ [ Para [ Str "as" ] ] ]
    , Para [ Str "\8452\160" ]
    ]
, Para
    [ Str "/"
    , Space
    , Strong [ Str "B" ]
    , Str ":"
    , Space
    , Str "bbb:"
    ]
, Para []
, BulletList [ [ Para [ Str "bs" ] ] ]
, Para
    [ Str "/"
    , Space
    , Strong [ Str "C" ]
    , Str ":"
    , Space
    , Str "ccc:"
    ]
, Para []
, BulletList [ [ Para [ Str "cs" ] ] ]
, Para
    [ Str "/"
    , Space
    , Str "D"
    , Space
    , Str "D:"
    , SoftBreak
    , Str "ddd"
    ]
, Para
    [ Str "/"
    , Space
    , Str "-E"
    , Space
    , Str "E:"
    , SoftBreak
    , Str "eee"
    ]
, Para
    [ Str "/"
    , Space
    , Str "-F"
    , Space
    , Str "F:"
    , SoftBreak
    , Str "fff"
    ]
]
jgm commented 1 month ago

What's weird is that typst-hs, which does the main work in the typst reader, seems to parse this correctly: for

#quote(block: true)[
- as
]

it produces

--- repr ---
document(body: { quote(block: true, 
                       body: { text(body: [
]), 
                               list(children: ({ text(body: [as]), 
                                                 parbreak() })) }), 
                 parbreak() })
jgm commented 1 month ago

I fixed a couple small issues. There remains the issue of the failed term list parsing. Simple example:

/ B: #block[
- bs
]

This is an issue in jgm/typst-hs, which isn't recognizing this as a term list.

jgm commented 1 month ago

OK, I see what is going on. typst-hs takes the typst documentation literally: "When the descriptions span over multiple lines, they use hanging indent to communicate the visual hierarchy." It requires a hanging indent, which isn't present in this case.

jgm commented 1 month ago

OK, that was three bugs for the price of one, but everything should work now.

ZoomRmc commented 1 month ago

Thanks a lot and sorry for the messy report.

Yeah, I understand how it all works more or less. There was confusion when I initially stumbled upon this bug, so I just tried to determine the point at which the bug occurs (unsuccessfully though). What threw me off is that the bug showed itself even when the reader wasn't supposed to be used at all (based on my understanding at the moment) and the program clearly could generate valid typst.

native -> pdf bug
native -> typ ok
typ -> pdf bug