jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.48k stars 3.38k forks source link

typst reader #8740

Closed jgm closed 10 months ago

jgm commented 1 year ago

There are two possible routes to converting typst documents using pandoc:

  1. Modify typst itself so that it can produce some output format that pandoc can already parse. HTML has been discussed, and there's also https://github.com/typst/typst/pull/461 which produces a pandoc AST. (Even if that were merged, it doesn't translate equations into the latex format expected by pandoc.)

  2. Add a reader to pandoc. This is a fairly complex project, because typst implements a complete programming language, which pandoc would have to interpret. On the other hand, writing interpreters in Haskell is fairly pleasant and I don't see any difficulties in principle. Steps:

    a. Add a typst reader to jgm/texmath (for math conversion) jgm/texmath#211 b. Create a self-standing Haskell library to parse and interpret typst, producing a typst-specific AST c. Create a pandoc reader that uses this library and then converts the typst AST to a pandoc AST

Of course any reader would be lossy, in the sense that detailed layout information (spacing, placing, font size, etc.) would be lost, but that's okay.

phiresky commented 1 year ago

Just to give my opinion, from what I see typst is a pretty quickly moving target right now (e.g. the release from two days ago made breaking changes to the syntax) so implementing another parser and interpreter for it might be a bit of a pain.

The authors of typst have given a general thumbs up for HTML output (no comment on my PR yet), but I think it's not clear yet how much of a first-class citizen layoutless output will be in general.

My PR could also be implemented as a standalone tool based on the typst+typst_library crates. I'm happy for contributions since it's not fully complete, especially with the equations. It's still all Rust though. It would also be easy to add a flag to typst to output its AST as JSON, but that would again be a very unstable target. And I guess you'd rather have something directly integrated in haskell without external dependencies?

By the way, would it be possible to make the pandoc AST support a different math format apart from latex strings? e.g. mathml.

jgm commented 1 year ago

In principle we could support a different math format, but that would be a huge breaking change. I'd like to avoid that.

Looks like the math is the main stumbling block for your tool. I could implement a typst reader in jgm/texmath, but that wouldn't interoperate with your rust code.

bdarcus commented 1 year ago

I added a request there for djot a few days ago.

https://github.com/typst/typst/issues/288

jgm commented 1 year ago

Here's my current thinking on this issue:

phiresky commented 1 year ago

Eventually we'd need some code that takes the math bits and converts them to the type used in texmath, so we can convert to other formats, but an integrated parser makes most sense.

Regarding that, it would be useful if pandoc could parse MathML from html and/or support having whatever texmath uses internally inside the pandoc AST instead of latex (doesn't have to be breaking i think, just additional?)

I assume, with MathML for the math?

From this comment it seems like right now they'd rather try to build their own custom math renderer for html unless MathML could actually support everything they support. Do you know if all of the feature set of typst math (and of latex math) can be represented in MathML or if there are limitations?

jgm commented 1 year ago

it would be useful if pandoc could parse MathML from html

Yes, it already can handle this (e.g. it can take in HTML with MathML math and output tex or docx or roff ms with formulas in the right format)

From https://github.com/typst/typst/pull/461#issuecomment-1491978864 it seems like right now they'd rather try to build their own custom math renderer for html unless MathML could actually support everything they support.

Not sure what the idea is here. MathML is the work of many people over many years; it ought to be able to represent what you need in equations. How it is rendered in browsers is a matter of browser support, but this has been getting better.

What's the alternative they have in mind? Generating SVG for the math and embedding it in the HTML?

jgm commented 1 year ago

I see now that #repr can be used in the online typst app to see how typst is representing things. E.g.

#repr([
#let f(x) = { x + x }

#f([hi])

- one
- two
  + sub

$e=m c^2 + alpha$

ok then *hi*
])

renders as

sequence( children: (
[ ],
sequence(children: ()), parbreak(),
sequence(children: ([hi], [hi])), parbreak(),
listitem(body: [one]),
[ ],
listitem(
body: sequence(children: ([two], [ ], enumitem(body: [sub]))), ),
parbreak(), equation(
body: sequence( children: (
[e],
[=],
[m],
[ ],
attach(base: [c], top: [2]), [ ],
[+], [ ], [α],
), ),
block: false, ),
parbreak(),
[ok then],
[ ],
strong(body: [hi]), [ ],
), )
phiresky commented 1 year ago

Ah nice I just tried it and it didn't but it was just my command line that was wrong.

can be used in the online typst app to see how typst is representing things

Yes, that looks very much like the input structure I used in my PR (the result after the "evaluation" stage). The only thing I found that was weird about it is that collecting individual list items into lists happens after that stage.

What's the alternative they have in mind? Generating SVG for the math and embedding it in the HTML?

Not sure, I don't think they've (publicly) said. Probably something like that.

If I have time I'll try to modify my PR to typst to output HTML instead of pandoc AST, shouldn't be hard. I guess they'd be more open to that, and probably conversion from HTML back to the pandoc internal representation is lossless anyways and then MathML input works as well (which doesn't with pandoc AST). Not sure how much effort converting their internal representation to MathML will be.

laurmaedje commented 1 year ago

Hey, I'm from the Typst team. Let me pitch in.

The authors of typst have given a general thumbs up for HTML output (no comment on my PR yet), but I think it's not clear yet how much of a first-class citizen layoutless output will be in general.

We want to make it as good and semantic as possible.

Typst maintainers seem committed to providing HTML output at some point ... presumably be consumable by pandoc without further additions to pandoc.

This seems to me the best way for pandoc to deal with Typst input. HTML output will take a while (because we want to do it the right way), but it is definitely coming.

From this comment it seems like right now they'd rather try to build their own custom math renderer

The person in the PR was not from the Typst team (currently, only me and @reknih are). We will definitely use MathML in the HTML output.

The only thing I found that was weird about it is that collecting individual list items into lists happens after that stage.

That's actually not the only thing. All styling with show rules also happens after that stage, so that wouldn't integrate with your current PR. For HTML output, sometimes you want a show rule to be applied and sometimes you don't, so that's a further complication. We're considering to add a fifth phase between evaluation and layout that does all the styling. From that phase's output, things like HTML export could be built. But that's not yet decided.

Your PR shows that it is relatively simple to add basic HTML output. However, it will fail in a large number of cases. And if I merge this now, people think that Typst has HTML export and they will start complaining how it fails in so many cases. That's why I'm hesitant at the moment.

jgm commented 1 year ago

This is great to hear. Do you have a timeline in mind for the HTML + MathML output?

Meanwhile I've started on a self-standing parser + evaluator for typst in Haskell. Maybe a foolish project, but so far going okay. (The grammar in the appendix of the typst paper is a great help!)

laurmaedje commented 1 year ago

This is great to hear. Do you have a timeline in mind for the HTML + MathML output?

Probably a few months out. Hard to say.

The grammar in the appendix of the typst paper is a great help!

Take care because it is a bit outdated and ignores some of the stuff that was hard to express (e.g. indent)!

jgm commented 1 year ago

Progress report: I now have a parser that can parse all of the samples in the typst documentations.

Next step: interpreter.

laurmaedje commented 1 year ago

I now have a parser that can parse all of the samples in the typst documentations.

You are mad and it's great! Is it open source? I am curious!

jgm commented 1 year ago

It's not openly available at this point, because it's still quite a mess, but it will be open source when it gets to a better state.

user202729 commented 1 year ago

Sounds good. Although parsing Typst formula is a bit tricky though. E.g. there are some things going on to make


Meanwhile I try to use Typst itself to convert Typst to LaTeX: external link

Mostly works, although this will probably be very unstable just like the JSON-parsing idea.

laurmaedje commented 1 year ago

@jgm By the way, I would also be very interested in your opinions on Typst's markup syntax. As designer of Djot, I'm sure you have some insights and might find some things quirky or wrong. Maybe there are things we can still improve before things settle down!

jgm commented 1 year ago

@laurmaedje I will try to do that when I have some time.

jgm commented 1 year ago

@laurmaedje I've gotten pretty far with my typst interpreter, but I have a question. It seems that in math mode, acute ,grave, etc. are ambiguous. In

$acute(x)$

acute denotes a function which produces the element accent(x, acute). But in e.g.

$accent(x, acute)$

acute denotes a symbol, the acute accent. This ambiguity is messing with my model of how typst code is to be interpreted, and I thought maybe you could clarify it. In my model,

$acute(x)$

is parsed as

[ Equation
    False
    [ Code
        "input"
        ( line 1 , column 2 )
        (FuncCall (Ident (Identifier "acute")) [ BlockArg [ Text "x" ] ])
    ]
, ParBreak
]

The interpreter evaluates things bottom up, so it interprets Ident (Identifier "acute") by looking it up in the symbol table governing the equation scope. This symbol table will either contain a Symbol or a Function. If it's a Symbol, then we get an error (FuncCall without a function value), just as we'd get in typst app when doing

#sym.acute(1)

If it's a Function, then this works, but accent(x, acute) won't. I have no difficulty assigning acute a function value, and no difficulty assigning it a symbol value, but I can't see how to handle the ambiguity. How is this done in typst?

EDIT: I think I've been able to figure this out by experimenting with the app, so no need to reply.

jgm commented 1 year ago

$f(x)/g(x)$ is indeed one that my parser currently gets wrong. What's the trick to this?

The interpreter is much more work than I'd anticipated, but I'm still working on it and getting much closer.

laurmaedje commented 1 year ago

$f(x)/g(x)$ is indeed one that my parser currently gets wrong. What's the trick to this?

Math operations parsing uses precedence rules like binary expressions in code. Fractions have lowest precedence, attachments (sub/super) the highest one, and function calls in between. Although we might change to function calls having the highest one (https://github.com/typst/typst/pull/985).

jgm commented 1 year ago

If I'd really thought about the fact that typst is actually a complex multi-paradigm programming language, I wouldn't have started down this route. (I just realized that arrays and dictionaries are mutable, so I have to rethink some things.)

But I've made some good progress. Here's an example of the current state of things.

Input ``` Total displaced soil by glacial flow: $ 7.32 beta + sum_(i=0)^nabla (Q_i (a_i - epsilon)) / 2 $ $ v := vec(x_1, x_2, x_3) $ $ a arrow.r.squiggly b $ $ A = pi r^2 $ $ "area" = pi dot "radius"^2 $ $ cal(A) := { x in RR | x "is natural" } $ #let x = 5 $ #x < 17 $ $ x < y => x gt.eq.not y $ // $ sum_(k=0)^n k // &= 1 + ... + n \ // &= (n(n+1)) / 2 $ $ frac(a^2, 2) $ $ vec(1, 2, delim: "[") $ $ mat(1, 2; 3, 4) $ $ lim_x = op("lim", limits: #true)_x $ ```
Parser output ``` hs [ Text "Total" , Space , Text "displaced" , Space , Text "soil" , Space , Text "by" , Space , Text "glacial" , Space , Text "flow" , Text ":" , ParBreak , Equation True [ Text "7.32" , Code "stdin" ( line 3 , column 8 ) (Ident (Identifier "beta")) , Text "+" , MAttach (Just (MGroup False [ Text "i" , Text "=" , Text "0" ])) (Just (Code "stdin" ( line 4 , column 13 ) (Ident (Identifier "nabla")))) (Code "stdin" ( line 4 , column 3 ) (Ident (Identifier "sum"))) , MFrac (MGroup True [ MAttach (Just (Text "i")) Nothing (Text "Q") , MGroup True [ MAttach (Just (Text "i")) Nothing (Text "a") , Text "-" , Code "stdin" ( line 5 , column 17 ) (Ident (Identifier "epsilon")) ] ]) (Text "2") ] , ParBreak , Equation True [ Text "v" , Text ":" , Text "=" , Code "stdin" ( line 7 , column 8 ) (FuncCall (Ident (Identifier "vec")) [ BlockArg [ MAttach (Just (Text "1")) Nothing (Text "x") ] , BlockArg [ MAttach (Just (Text "2")) Nothing (Text "x") ] , BlockArg [ MAttach (Just (Text "3")) Nothing (Text "x") ] ]) ] , ParBreak , Equation True [ Text "a" , Code "stdin" ( line 9 , column 5 ) (FieldAccess (Ident (Identifier "squiggly")) (FieldAccess (Ident (Identifier "r")) (Ident (Identifier "arrow")))) , Text "b" ] , ParBreak , Equation True [ Text "A" , Text "=" , Code "stdin" ( line 11 , column 7 ) (Ident (Identifier "pi")) , MAttach Nothing (Just (Text "2")) (Text "r") ] , ParBreak , Equation True [ Text " area" , Text "=" , Code "stdin" ( line 13 , column 12 ) (Ident (Identifier "pi")) , Code "stdin" ( line 13 , column 15 ) (Ident (Identifier "dot")) , MAttach Nothing (Just (Text "2")) (Text " radius") ] , ParBreak , Equation True [ Code "stdin" ( line 15 , column 3 ) (FuncCall (Ident (Identifier "cal")) [ BlockArg [ Text "A" ] ]) , Text ":" , Text "=" , Text "{" , Text "x" , Code "stdin" ( line 16 , column 9 ) (Ident (Identifier "in")) , Code "stdin" ( line 16 , column 12 ) (Ident (Identifier "RR")) , Text "|" , Text "x" , Text " is natural" , Text "}" ] , ParBreak , Code "stdin" ( line 18 , column 2 ) (Let (BasicBind (Just (Identifier "x"))) (Literal (Int 5))) , ParBreak , Equation True [ Code "stdin" ( line 20 , column 4 ) (Ident (Identifier "x")) , Text "<" , Text "17" ] , ParBreak , Equation True [ Text "x" , Text "<" , Text "y" , Text "=" , Text ">" , Text "x" , Code "stdin" ( line 22 , column 14 ) (FieldAccess (Ident (Identifier "not")) (FieldAccess (Ident (Identifier "eq")) (Ident (Identifier "gt")))) , Text "y" ] , ParBreak , Comment , Comment , Comment , SoftBreak , Equation True [ Code "stdin" ( line 28 , column 3 ) (FuncCall (Ident (Identifier "frac")) [ BlockArg [ MAttach Nothing (Just (Text "2")) (Text "a") ] , NormalArg (Literal (Int 2)) ]) ] , ParBreak , Equation True [ Code "stdin" ( line 30 , column 3 ) (FuncCall (Ident (Identifier "vec")) [ NormalArg (Literal (Int 1)) , NormalArg (Literal (Int 2)) , KeyValArg (Identifier "delim") (Literal (String "[")) ]) ] , ParBreak , Equation True [ Code "stdin" ( line 32 , column 3 ) (FuncCall (Ident (Identifier "mat")) [ ArrayArg [ [ Text "1" , Text "2" ] , [ Text "3" , Text "4" ] ] ]) ] , ParBreak , Equation True [ MAttach (Just (Text "x")) Nothing (Code "stdin" ( line 34 , column 3 ) (Ident (Identifier "lim"))) , Text "=" , MAttach (Just (Text "x")) Nothing (Code "stdin" ( line 35 , column 5 ) (FuncCall (Ident (Identifier "op")) [ NormalArg (Literal (String "lim")) , KeyValArg (Identifier "limits") (Literal (Boolean True)) ])) ] , ParBreak ] ```
Evaluator output ```hs fromList [ Txt "Total" , Txt " " , Txt "displaced" , Txt " " , Txt "soil" , Txt " " , Txt "by" , Txt " " , Txt "glacial" , Txt " " , Txt "flow" , Txt ":" , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Txt "7.32" , Txt "\946" , Txt "+" , Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VContent (fromList [ Txt "i" , Txt "=" , Txt "0" ]) ) , ( Identifier "base" , VContent (fromList [ Txt "\8721" ]) ) , ( Identifier "t" , VContent (fromList [ Txt "\8711" ]) ) ] } , Elt { eltName = Identifier "frac" , eltFields = fromList [ ( Identifier "denom" , VContent (fromList [ Txt "2" ]) ) , ( Identifier "num" , VContent (fromList [ Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VContent (fromList [ Txt "i" ]) ) , ( Identifier "base" , VContent (fromList [ Txt "Q" ]) ) , ( Identifier "t" , VNone ) ] } , Txt "(" , Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VContent (fromList [ Txt "i" ]) ) , ( Identifier "base" , VContent (fromList [ Txt "a" ]) ) , ( Identifier "t" , VNone ) ] } , Txt "-" , Txt "\949" , Txt ")" ]) ) ] } ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Txt "v" , Txt ":" , Txt "=" , Elt { eltName = Identifier "vec" , eltFields = fromList [ ( Identifier "children" , VArray [ VContent (fromList [ Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VContent (fromList [ Txt "1" ]) ) , ( Identifier "base" , VContent (fromList [ Txt "x" ]) ) , ( Identifier "t" , VNone ) ] } ]) , VContent (fromList [ Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VContent (fromList [ Txt "2" ]) ) , ( Identifier "base" , VContent (fromList [ Txt "x" ]) ) , ( Identifier "t" , VNone ) ] } ]) , VContent (fromList [ Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VContent (fromList [ Txt "3" ]) ) , ( Identifier "base" , VContent (fromList [ Txt "x" ]) ) , ( Identifier "t" , VNone ) ] } ]) ] ) ] } ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Txt "a" , Txt "\8669" , Txt "b" ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Txt "A" , Txt "=" , Txt "\960" , Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VNone ) , ( Identifier "base" , VContent (fromList [ Txt "r" ]) ) , ( Identifier "t" , VContent (fromList [ Txt "2" ]) ) ] } ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Txt " area" , Txt "=" , Txt "\960" , Txt "\8901" , Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VNone ) , ( Identifier "base" , VContent (fromList [ Txt " radius" ]) ) , ( Identifier "t" , VContent (fromList [ Txt "2" ]) ) ] } ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Elt { eltName = Identifier "cal" , eltFields = fromList [ ( Identifier "body" , VContent (fromList [ Txt "A" ]) ) ] } , Txt ":" , Txt "=" , Txt "{" , Txt "x" , Txt "\8712" , Txt "\8477" , Txt "|" , Txt "x" , Txt " is natural" , Txt "}" ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Txt "5" , Txt "<" , Txt "17" ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Txt "x" , Txt "<" , Txt "y" , Txt "=" , Txt ">" , Txt "x" , Txt "\8817" , Txt "y" ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Txt "\n" , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Elt { eltName = Identifier "frac" , eltFields = fromList [ ( Identifier "denom" , VContent (fromList [ Txt "2" ]) ) , ( Identifier "num" , VContent (fromList [ Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VNone ) , ( Identifier "base" , VContent (fromList [ Txt "a" ]) ) , ( Identifier "t" , VContent (fromList [ Txt "2" ]) ) ] } ]) ) ] } ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Elt { eltName = Identifier "vec" , eltFields = fromList [ ( Identifier "children" , VArray [ VContent (fromList [ Txt "1" ]) , VContent (fromList [ Txt "2" ]) ] ) , ( Identifier "delim" , VString "[" ) ] } ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Elt { eltName = Identifier "mat" , eltFields = fromList [ ( Identifier "rows" , VArray [ VArray [ VContent (fromList [ Txt "1" ]) , VContent (fromList [ Txt "2" ]) ] , VArray [ VContent (fromList [ Txt "3" ]) , VContent (fromList [ Txt "4" ]) ] ] ) ] } ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } , Elt { eltName = Identifier "equation" , eltFields = fromList [ ( Identifier "block" , VBoolean True ) , ( Identifier "body" , VContent (fromList [ Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VContent (fromList [ Txt "x" ]) ) , ( Identifier "base" , VContent (fromList [ Elt { eltName = Identifier "op" , eltFields = fromList [ ( Identifier "limits" , VBoolean True ) , ( Identifier "text" , VString "lim" ) ] } ]) ) , ( Identifier "t" , VNone ) ] } , Txt "=" , Elt { eltName = Identifier "attach" , eltFields = fromList [ ( Identifier "b" , VContent (fromList [ Txt "x" ]) ) , ( Identifier "base" , VContent (fromList [ Elt { eltName = Identifier "op" , eltFields = fromList [ ( Identifier "limits" , VBoolean True ) , ( Identifier "text" , VString "lim" ) ] } ]) ) , ( Identifier "t" , VNone ) ] } ]) ) , ( Identifier "numbering" , VNone ) ] } , Elt { eltName = Identifier "parbreak" , eltFields = fromList [] } ] ```
Pandoc AST ``` hs Pandoc Meta { unMeta = fromList [] } [ Para [ Str "Total" , Space , Str "displaced" , Space , Str "soil" , Space , Str "by" , Space , Str "glacial" , Space , Str "flow:" ] , Para [ Math DisplayMath "7.32\\beta + \\sum_{i = 0}^{\\nabla}\\frac{Q_{i}(a_{i} - \\varepsilon)}{2}" ] , Para [ Math DisplayMath "v: = \\begin{pmatrix}\nx_{1} \\\\\nx_{2} \\\\\nx_{3} \\\\\n\\end{pmatrix}" ] , Para [ Math DisplayMath "a \\rightsquigarrow b" ] , Para [ Math DisplayMath "A = \\pi r^{2}" ] , Para [ Math DisplayMath "\\text{ area} = \\pi \\cdot \\text{ radius}^{2}" ] , Para [ Math DisplayMath "\\mathcal{A}: = \\{ x \\in {\\mathbb{R}}|x\\text{ is natural}\\}" ] , Para [ Math DisplayMath "5 < 17" ] , Para [ Math DisplayMath "x < y = > x \\ngeq y" ] , Para [ Math DisplayMath "\\frac{a^{2}}{2}" ] , Para [ Math DisplayMath "\\begin{bmatrix}\n1 \\\\\n2 \\\\\n\\end{bmatrix}" ] , Para [ Math DisplayMath "\\begin{pmatrix}\n1 & 2 \\\\\n3 & 4 \\\\\n\\end{pmatrix}" ] , Para [ Math DisplayMath "\\lim_{x} = \\lim_{x}" ] ] ```
LaTeX ``` tex Total displaced soil by glacial flow: \[7.32\beta + \sum_{i = 0}^{\nabla}\frac{Q_{i}(a_{i} - \varepsilon)}{2}\] \[v: = \begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{pmatrix}\] \[a \rightsquigarrow b\] \[A = \pi r^{2}\] \[\text{ area} = \pi \cdot \text{ radius}^{2}\] \[\mathcal{A}: = \{ x \in {\mathbb{R}}|x\text{ is natural}\}\] \[5 < 17\] \[x < y = > x \ngeq y\] \[\frac{a^{2}}{2}\] \[\begin{bmatrix} 1 \\ 2 \\ \end{bmatrix}\] \[\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ \end{pmatrix}\] \[\lim_{x} = \lim_{x}\] ```
HTML ```html

Total displaced soil by glacial flow:

7.32β+i=0Qi(aiε)27.32\beta + \sum_{i = 0}^{\nabla}\frac{Q_{i}(a_{i} - \varepsilon)}{2}

v:=(x1x2x3)v: = \begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{pmatrix}

aba \rightsquigarrow b

A=πr2A = \pi r^{2}

area=π radius2\text{ area} = \pi \cdot \text{ radius}^{2}

𝒜:={x|x is natural}\mathcal{A}: = \{ x \in {\mathbb{R}}|x\text{ is natural}\}

5<175 < 17

x<y=>xyx < y = > x \ngeq y

a22\frac{a^{2}}{2}

[12]\begin{bmatrix} 1 \\ 2 \\ \end{bmatrix}

(1234)\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ \end{pmatrix}

limx=limx\lim_{x} = \lim_{x}

```

Still to do:

laurmaedje commented 1 year ago

But I've made some good progress. Here's an example of the current state of things.

That's actually pretty amazing! Especially how clean the LaTeX output is.

user202729 commented 1 year ago

Side note: As it turns out TeX itself is also a complex programming language, it's just that fortunately most people don't use the "programming" capabilities much (since code is so unreadable and unmaintainable).

Typst will change a lot though, maybe there's still time to switch to the "read Typst JSON output" route.

Side note, array and dict is a bit complicated. It's "mutable" in the sense .push() modifies its value, but .push() can only be used where it does not break the purity. For example

I'm not sure what exactly is going on (Typst documentation doesn't exactly dig into the internals), but I suppose it's reasonable to consider .push() etc. as just "syntactic sugar" to reassigning the array, and array data type is actually immutable.

Finally, I don't think it's possible to simulate things like measure without reimplementing Typst itself. state may be feasible, but definitely doesn't sound easy. locate and page numbering......... I don't know.

A note about the LaTeX output. Do you plan to ever use \left...\right parentheses? ( It would be better to not use them at all and stick with (...), but the sizing might be incorrect, best way would be to use \big, \Big etc. (because unlike \left...\right it does not break the spacing, and sometimes it's too large --https://tex.stackexchange.com/questions/173717/is-it-ever-bad-to-use-left-and-right . )

jgm commented 1 year ago

I'm not going to be able to implement measure and locate -- that's not part of the ambition here.

The array behavior does sound complicated. I thought about treating it as syntactic sugar for reassigning, but the problem is that you don't always have an identifier to work with. (E.g. suppose the array is the output of a function.)
(EDIT: Ah, I see, in that case we get an error that we can't mutate a temporary value. So maybe this is syntactic sugar?)

The LaTeX and MathML output are produced by my texmath library. So here we "only" need to convert a typst expression into texmath's intermediate data structure, and then we can get TeX, MathML, Word equation, roff eq, or plain text.

laurmaedje commented 1 year ago

Side note, array and dict is a bit complicated. It's "mutable" in the sense .push() modifies its value, but .push() can only be used where it does not break the purity here.

Push doesn't do anything special to not break purity, it's just that variables from outside of a closure can't be modified in general (also not through assignment). It is a bit special because it can modify its receiver while normal functions can't do such a thing.

I suppose it's reasonable to consider .push() etc. as just "syntactic sugar" to reassigning the array, and array data type is actually immutable.

You might be able to implement it in this way, but in the Typst compiler it is mutable. Note that you can also mutate nested arrays:

#{
  let x = ((1, 2), (3, 4))
  x.at(1).at(0) += 2
  x
}

Typst will change a lot though, maybe there's still time to switch to the "read Typst JSON output" route.

There will indeed probably be breaking changes for a while to come. Compared to a special JSON target, I think it would be fine to simply use the planned HTML target. We intend to export plain, readable HTML + MathML that is true to the semantics.

jgm commented 1 year ago

Okay, thanks for the clarification. I implemented it the other way, but my implementation wouldn't handle the case you give here. Maybe that's okay for now, given that I'm not going to be able to hope for 100% compatibility.

I agree that converting from the HTML target produced by typst will probably be the most reliable path forward. So, not sure what to do with the big library I've been developing. Given how far I've gotten, it probably makes sense to publish it and maybe use it to add a typst reader to pandoc, but I may want to remove it once the HTML target is in place...

MilesCranmer commented 1 year ago

@jgm sorry if I missed this but did you put up your pandoc typst reader somewhere? Even if very lossy I think myself and others would be quite interested in using it (especially to enable typst use as a drafting tool before converting to LaTeX for journal requirements).

jgm commented 1 year ago

I'm still working on it, but development has been messy enough that I haven't wanted to publish it yet. I've been making a lot of progress but now I have 0.4 changes to deal with! Most things are working pretty well now. I'll try to make something available before too long, so people can try it.

jgm commented 1 year ago

I've put the library on jgm/typst-hs. The library is used in the typst-reader branch of jgm/pandoc. Some things still aren't supported -- e.g. "show set" rules, references, and things involving mutable arrays and dictionaries. But it should be good enough now to be useful.

Instructions to compile this pandoc branch from source

First, install ghcup if you don't have ghc and cabal already on your system. Then:

git clone https://github.com/jgm/pandoc -b typst-reader pandoc-typst-reader
cd pandoc-typst-reader
cabal build pandoc pandoc-cli

To find the executable after the build completes:

cabal list-bin pandoc-cli

If you want to save time and energy you can add --disable-optimization to both the cabal build and cabal list-bin comands.

jgm commented 1 year ago

Don't use the binary I linked to earlier; I've fixed a bunch of bugs. Compile from source using instructions above.

jgm commented 1 year ago

Progress report: here's the result of converting undergradmath.typ:

http://htmlpreview.github.io/?https://gist.githubusercontent.com/jgm/87684478a7e48ee28ffba1f8581b89c3/raw/2cbe53c51d3a30d6b8a7330884bfc93f58365073/undergradmath.html

(I commented out one show rule for raw, because it caused the code to be displayed in the normal font, but otherwise it's unchanged.)

If you know of other real-world typst documents that I could try, that would be helpful.

FeralFlora commented 1 year ago

If you know of other real-world typst documents that I could try, that would be helpful.

There is a showcase channel on the Typst discord. Perhaps that might help?

jgm commented 1 year ago

Here's the converted Fibonnaci sequence example from the typst README:

% pandoc -f typst -o fibs.html -s --mathml
#set page(width: 10cm, height: auto)
#set heading(numbering: "1.")

= Fibonacci sequence
The Fibonacci sequence is defined through the
recurrence relation $F_n = F_(n-1) + F_(n-2)$.
It can also be expressed in _closed form:_

$ F_n = round(1 / sqrt(5) phi.alt^n), quad
  phi.alt = (1 + sqrt(5)) / 2 $

#let count = 8
#let nums = range(1, count + 1)
#let fib(n) = (
  if n <= 2 { 1 }
  else { fib(n - 1) + fib(n - 2) }
)

The first #count numbers of the sequence are:

#align(center, table(
  columns: count,
  ..nums.map(n => $F_#n$),
  ..nums.map(n => str(fib(n))),
))
^D
image
Thumuss commented 1 year ago

If you know of other real-world typst documents that I could try, that would be helpful.

There is a showcase channel on the Typst discord. Perhaps that might help?

In general, the showcase channel has code that wouldn't work here. They use a lot of subtils rules or commands, even "show set" are used very often.

jgm commented 1 year ago

Question @laurmaedje : what is the best way to handle show rules on, e.g. headings? For example, undergradmath.typ has this rule:

// Run-in sections, like LaTeX \paragraph
#show heading.where(
  level: 1
): it => text(
  size: normalsize,
  weight: "bold",
  fill: headcolor,
  it.body + h(0.67em)
)

On my current understanding of the rule, it causes us to get a text element instead of a heading element. This gives us the desired visual output in pandoc's rendering, but semantically we no longer have <h1> elements in the HTML, which seems a big drawback.

If we were just handcrafting HTML, we could keep the <h1> but modify the CSS to get the desired effect. But on my understanding, a show rule affects things at the level of content (elements) -- and then what can pandoc do but render the resulting elements? Besides, pandoc has to handle many different output formats; this is not just a typst -> HTML converter.

laurmaedje commented 1 year ago

@jgm This isn't fully fleshed out yet as we haven't implemented HTML export yet, but here are my thoughts so far:

Even though show rules visually are just replacement, semantically they aren't quite that. They realized output (a new planned intermediate representation for the compiler that has all set & show rules resolved) will keep track of where some content stems from (in this case from a heading). The HTML exporter can pick that information up. Similarly, the layouter would link that information into the final layout, so that the PDF exporter can use it to write a structure tree, making the file accessible.

Also, not all show rules are the same. Some define how to display one semantic element in terms of others. Others define how to layout a semantic element visually. In both cases, it is good to retain the ancestry, but for HTML export we don't want to execute the show rule in the second case at all. I want to deal with this by applying some show rules conditionally (and users can do so by checking the export target via scripting).

A user can decide whether to write two completely separate "style sheets" (set & show rules) for HTML and PDF or to write anything in between with local target checks that reconfigure just a few rules. The goal for the HTML output is to have it as semantically rich as possible.

memeplex commented 8 months ago

I would like to experiment with typst for my thesis but have a escape hatch just in case. Just to know what to expect from the reader, to what extent have you managed to implement the more dynamic/programmatic parts of typst, including show/set rules? Or perhaps you're consuming an already processed tree, maybe pre-lifting, although it's not the impression I got from reading the above discussion and, moreover, I don't know whether getting such thing from the compiler CLI is possible at all. Thanks!

jgm commented 8 months ago

We've implemented a large part of the language, but it's definitely not 100%. I think you just need to experiment a bit to get a sense.

memeplex commented 7 months ago

Is it possible to indicate a table header?

Both tables and grids are treated in a similar way:

#table(
    columns: 3,
    [A], [B], [C],
    [1], [2], [3],
)

#grid(
  columns: 5,
  gutter: 5pt,
  ..range(25).map(str)
)

--->

  --- --- ---
  A   B   C
  1   2   3
  --- --- ---

  ---- ---- ---- ---- ----
  0    1    2    3    4
  5    6    7    8    9
  10   11   12   13   14
  15   16   17   18   19
  20   21   22   23   24
  ---- ---- ---- ---- ----

Unfortunately at a base level the language has no way of expressing headers, rowspans, colspans, etc. AFAICS. There is the tablex package that produces nice rendered output but I guess it's grids all the way down with little semantic/structural remainings.

memeplex commented 7 months ago

It seems that the author of tablex may merge some changes into typst that add align, inset, fill and stroke.

This may help a bit (e.g. alignment) but still not much regarding structure.

@laurmaedje taking into account that this also affects html export, what are your thoughts about it? Maybe it's intentional but it's not actually covered in https://github.com/typst/typst/issues/721 besides:

Indeed more than a mapping what's missing here is a more expressive primitive for tables at the structural level.

EDIT: I now realize those are two steps of a very ambitious roadmap https://github.com/typst/typst/issues/3001. It this is the plan, it certainly looks amazing.