jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.52k stars 3.37k forks source link

Typst Writer: Standalone figures overflow page margins #9236

Closed iandol closed 7 months ago

iandol commented 11 months ago

With this markdown (image is this one):

# Title

Text.

![Caption](xkcd.png)

The Typst output yields a figure with a width attribute that greatly overflows the page:

= Title
<title>
Text.

#figure([#box(width: 1480.0pt, image("xkcd.png"))],
  caption: [
    Caption
  ]
)
image

Removing the width attribute or setting it to 100% and the figure renders as may be expected.

image

From @jgm on bug #9104:

We are trying here to make the typst writer behave like other pandoc writers behave. Other writers don't automatically resize images to page width. Instead, they leave the image at its native size, unless a size attribute is specified.

Typst seems to default to ensuring the image cannot exceed page margins unless a width is specified, which it then seems to respect. This naively seems like the most expected behaviour. As the computed width is not in the AST we can't remove it with a filter (perhaps if we inject 100% for all figures without a width attribute a lua filter can fix this). @jgm suggests the user can just add widths to standalone figures, but the source document may not have come from the user directly, but via another app etc.

I think @cscheid also does some manual tweaking of the Pandoc output in Quarto to make this work?

iandol commented 11 months ago

Quarto does override Pandoc's behaviour AFAICT:

https://github.com/quarto-dev/quarto-cli/blob/main/src/resources/filters/layout/typst.lua#L61

iandol commented 11 months ago

In the meantime, we can use a Lua filter on image elements to force a width=100% (or a value from an environment variable if set) unless a width has been explicitly specified:

function Image(im)
    local env = pandoc.system.environment()
    local var = env["FIGWIDTH"] -- possible override with ENV
    if not var or var == "" then
        width = "100%"
    else
        width = var
    end
    if not im.attributes.width then
        im.attributes.width = width
        return im
    end
end
jgm commented 10 months ago

I suppose one pragmatic idea (when no explicit width is specified) would be to compute the width of the image, and if it's greater than, say, 5 inches, set to 100%?

This would probably give satisfactory results in most cases.

iandol commented 10 months ago

Yes there are two options:

1) Your suggested heuristic: for us-letter, 8.5" page width with 1" margins a max width would be 5.5", but for a4 it would be 5.27" and giving some wiggle room 5" seems like a sane default. For users with exotic paper size requirements, they would need to use an explicit attribute or a Lua filter. 2) I did double-check, Typst without a width attribute does automatically resize small or large figures to the margin widths. This is their default behaviour. So the question is whether the expectations from other writers that may render content differently, should affect the "default" Typst behaviour. Option 2 would be to simply not add a width.

I checked the same document with latex, and there no width is inserted:

\begin{figure}
\centering
\includegraphics{xkcd.png}
\caption{Caption}
\end{figure}
#figure([#box(width: 1480.0pt, image("xkcd.png"));],
  caption: [
    Caption
  ]
)

So the idea of inserting a width if none is given doesn't apply to latex, at least not for this 1480px wide 72dpi image...

I'm sure there is something I am missing regarding option 2. Either solutions would, I think, be better than the current behaviour...

iandol commented 10 months ago

This is the document for image layout in Typst:

https://typst.app/docs/reference/visualize/image/

width - auto or [relative](https://typst.app/docs/reference/layout/relative/)

The width of the image.

Default: auto

There is a fit attribute though I don't quite understand the difference in options ("cover" is default).

Finally at least according to the image docs, "1480pt" is not a valid value, as it accepts relative widths which comprise a length (like "1480pt" and a ratio), but i suspect that is a typo in their docs...

laurmaedje commented 10 months ago

Wherever a relative length is accepted, an absolute one is fine, too. The ratio is just 0% then.

The fit attribute defines what happens if the aspect ratio of the width and height arguments doesn't match the aspect ratio of the image itself.

jgm commented 10 months ago

So the idea of inserting a width if none is given doesn't apply to latex, at least not for this 1480px wide 72dpi image...

In LaTeX we don't need to insert a width, because LaTeX won't automatically resize the figure. We do have code in the custom template to shrink the figure if it is greater than the textwidth (but leave it alone otherwise). If something like this were possible for typst, that would be the best solution.

laurmaedje commented 10 months ago

We do have code in the custom template to shrink the figure if it is greater than the textwidth (but leave it alone otherwise). If something like this were possible for typst, that would be the best solution.

That is possible, but only with the measure function.

#figure([#box(width: 1480.0pt, image("xkcd.png"));], ...)

If specifying a fixed width is desired, there is a width parameter directly in the image function, so the box can be skipped.

My personal opinion: To have a good layout, some touching up or switching to a template will be necessary anyway. In that sense, I'd think its more desirable to generate as clean and straight-forward output as possible. Isn't the goal of pandoc to transfer the semantics rather than the appearance?

cscheid commented 10 months ago

If specifying a fixed width is desired, there is a width parameter directly in the image function, so the box can be skipped.

I think the box is there because images in Pandoc are inline elements, and images in typst are block elements. That is, more or less, the root of these issues: the desire to use inline Images in a way that is more-or-less compatible with other output formats like HTML, PDF, or docx.

Isn't the goal of pandoc to transfer the semantics rather than the appearance?

I can't speak for how Pandoc's goals are implemented (obviously), but I think that for typst and LaTeX both, "appearance is the semantics". People use it at least in part for the quality of the typesetting, and then appearance is part of the behavior you want. In that case, I think that the value of Pandoc includes "predictable semantics" given the same input, and the typst behavior of automatically resizing the image is at odds with the standard Pandoc output of docx, PDF, and HTML. In that case, I would defend the choice of emitting output that is not as clean, but more consistent across Pandoc outputs.

I think this "opinionated view" is already the case for Pandoc in other settings: the existence of default templates themselves, LaTeX tables being written as longtable environments (instead of tabular), etc.

laurmaedje commented 10 months ago

I think the box is there because images in Pandoc are inline elements, and images in typst are block elements.

Makes sense for free-standing images. For images in figures, it could be omitted though.

I think that for typst and LaTeX both, "appearance is the semantics".

I'd like to think that Typst's input format is quite semantic rather than just being the input to a layout engine, but I guess it is fair view from a pandoc or quarto point of view.

jgm commented 10 months ago

It's hard to know what's best here. It's good to have less cluttered output, but on the other hand it's also good if your document renders similarly with typst backend and latex backend. People with smaller images in figures might not necessarily want them blown up to page width. Yes, they could specify an explicit size, but they don't need to do that for other output formats, so it's an extra thing to remember when using typst.

Another thing to consider: when non-vector format images are not already as large as page width or larger, resizing them may well look bad, as they weren't designed for that resolution.

laurmaedje commented 10 months ago

I will discuss with @reknih whether Typst's current default behaviour is desirable or should be changed.

laurmaedje commented 10 months ago

We've discussed this and will change Typst's behaviour. We'll keep the behaviour that images are scaled down to not overflow the page, but they won't be upscaled anymore if they're smaller than the page width.

jgm commented 10 months ago

Excellent. I will change pandoc's behavior accordingly.

cscheid commented 10 months ago

(We'll change Quarto as well.)

jgm commented 10 months ago

So, what I'm planning to implement is this:

image with no size given: #box(image("name.jpg")) image with size given: #box(image("name.jpg", width: 3in))

I retain the #box because a pandoc Image element is an inline element, and it seems that image in typst always gets treated as block-level in the absence of #box. In principle we could detect the case where the figure contains nothing but an image, and omit the #box in that case, but I don't know if it's worth it.

I'm not sure whether the width and height should go on the box or on the image, or whether it matters?

laurmaedje commented 10 months ago

I think omitting the box in a figure would be nice, but it's up to you.

Where the width goes doesn't really matter. Putting it on the image is maybe a bit simpler.

jgm commented 10 months ago

What's the time frame for this change to typst?

laurmaedje commented 10 months ago

We'll do it before the next release (0.11). That release is still some time away though, it is planned to happen sometime in the first half of February.

jgm commented 10 months ago

OK, I'll leave this open then and we can make this change when we support typst 0.11.

laurmaedje commented 7 months ago

I've opened https://github.com/typst/typst/pull/3571 to address this. I went down a bit of a rabbit hole extracting DPI metadata so that the image is scaled to its natural size at its desired DPI rather than at 1px = 1pt.

iandol commented 7 months ago

0.11 appears to have been formally released (amazing work, congrats to @laurmaedje and the hard work of all the other devs, table changes are great!):

https://typst.app/docs/changelog/

jgm commented 7 months ago

That's quite an extensive changelog. I will update typst-hs and pandoc to handle some of these changes, but really it's too much for me to do on my own, given other commitments. It would really be nice if someone else could help out. I'm guessing that there is a nonempty intersection between Typst users and Haskell programmers?

gordonwoodhull commented 7 months ago

Hi @jgm, I'm improving Typst functionality in Quarto and eager to contribute to Pandoc. I know we are interested in the new Typst table / grid functionality #9588 and probably many other things.

I'm new to the Pandoc codebase, but I've done similar work in OCaml and always wanted to try Haskell.

I was hoping to see some breakage in order to have an easy first contribution, but we only needed a minor fix in order to pass our test suite using Typst 0.11. Other than that, I have only seen the expected image size changes, a nice improvement.

Anyway, I'm still finding my way around but hope to contribute in coming days.

jgm commented 7 months ago

@gordonwoodhull - glad to hear it. I will start on improving tables in the pandoc typst writer, and maybe when I have something rough I can make a draft PR and you can help test and refine it.