typst / typst

A new markup-based typesetting system that is powerful and easy to learn.
https://typst.app
Apache License 2.0
32.28k stars 864 forks source link

Typed output in cli query (Serialization) #3585

Open LDemetrios opened 5 months ago

LDemetrios commented 5 months ago

Description

I propose to include type descriptors during serialization of the values of some types, the same way as it happens with different content types, for example:

{
    "type": "datetime",
    "year": 2024,
    "month": 3,
    "day": 8
}

instead of

"datetime(year: 2024, month: 3, day: 8)"

The same applies to duration, regex, label and version types, and, maybe, for fraction, length, ratio, relative and alignment:

{
    "type": "relative",
    "percent": 50,
    "pt": -2,
    "em": 1,
}

instead of

"50% + -2pt + 1em",

Things get even worse with parsing colors. I played with it a bit and got the number of possible variants scares me. It's at least 14. So, again,

{
    "type": "rgb",
    "hex": "#0b1621"
}

instead of

"rgb(\"#0b1621\")",

Use Case

I assume, one decides to use json because it is one of the most used format, and json libraries exist for every language. Thus, there is no need to write parsers, just work with the data. But when there's a JSON strig like color.linear-rgb(100%, 49.8%, 49.8%, 50%)", additional parsing is required.

Moreover, there are many libraries that could convert JSON not to suitable structure (list, map, etc), but directly to the object model. Depending on the language, it may use reflection (JVM-based languages) or macros (Rust), or something else. But as far as it isn't Prolog, it is hard to provide arbitrary templates for values. It is common practice to have so-called type descriptors: parser gets it first, and then decides exactly which object to construct. It is already present in content representation, named func:

 {
    "func": "sequence",
    "children": [
        {
            "func": "emph",
            "body": {
                "func": "text",
                "text": "aaa"
            }
        },
        {
            "func": "space"
        },
        {
            "func": "text",
            "text": "bbb"
        },
    ]
}

Thus I can create my own object model for typst content:

interface Content
data object Space : Content
data class Text(val text: String) : Content
data class Emph(val body: Content) : Content

Plus few annotation which handle inheritance, differences in naming case and default values. And then very simple call:

val content = json.deserialize<Content>("...")

It's important to note that this kind of thing is possible in many languages, so this will make life easier for many developers.

P.S. I understand that these changes may break existing bindings, so, may be, it would be better to introduce another format, for example dom-json, and deprecate previous one. But such improvements may simplify writing bindings from other languages a lot.

P.P.S I couldn't find any place with full list of types or element functions, and had to search for them across all the documentation. I guess, it would be nice to collect all the information about typst object model in one place. Is there such place already?

LDemetrios commented 5 months ago

(More P).S. I suppose there are equally powerful libraries for parsing yaml or other formats, so may be the same should be applied to yaml output

LDemetrios commented 5 months ago

Even more troubles originate from the fact that string "50% + -2pt + 1em" became insdistinguishable from the value it represents, which may lead to parsing errors. In my proposal though the value became indistinguishable from dictionary, so, dictionary must be represented with more complicated structure:

{
    "type": "dictionary",
    "entries" : {
        "a" : 1,
        "b" : 2,
    }
}

instead of just

{
    "a" : 1,
    "b" : 2,
}

As for me, that is not a great problem.