quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.98k stars 328 forks source link

Quarto improperly merges metadata for Lua filter use #11139

Open cscheid opened 1 month ago

cscheid commented 1 month ago

See https://github.com/quarto-dev/quarto-cli/discussions/11138

cscheid commented 1 month ago

Repro

_quarto.yml

... # default project metadata from `quarto create project website`

mymeta:
  key1: projectwise_value
  key2: projectwise_value_2

index.qmd

---
title: "This is the page title"
filters:
  - test.lua
mymeta:
  key1: projectwise_value_override
---

Hello.

Discussion

The problem is here:

https://github.com/quarto-dev/quarto-cli/blob/740829733499855335a7861e3c4f1333a09db45d/src/command/render/pandoc.ts#L994-L1015

In the repro above, we have the following values for pandocMetadata and engineMetadata:

pandocMetadata:
  ...
  title: "This is the page title"
  mymeta:
    key1: "projectwise_value_override"
    key2: "projectwise_value_2"

engineMetadata:
  title: "This is the page title"
  mymeta:
    key1: "projectwise_value_override"

The code exists so that if an execution engine emits metadata, we can merge it appropriately against the metadata that we send to Pandoc. (Because of reasons described elsewhere, we can't just send the YAML metadata to Pandoc.)

The problem, then, is how to merge the data in engineMetadata into pandocMetadata. If we simply replace values (the current behavior), then we run into this bug. If we instead use mergeConfigs (which is more or less the intended behavior), then merged string values become arrays. That's clearly wrong as well. EDIT: See below.

This is quite annoying, and I don't think we can fix it well at all in the present state of the codebase.

cscheid commented 1 month ago

then merged string values become arrays. That's clearly wrong as well.

That is wrong. I was confused by this (terrible?) behavior from ld.mergeWith. First, mergeWith is not consistent:

$ deno
> import ld from "npm:lodash"
> JSON.stringify(ld.mergeWith("src", "dest")) // left wins
'"src"'
> JSON.stringify(ld.mergeWith({foo: "bar"}, {foo: "baz"})) // right wins
'{"foo":"baz"}'

But this is worse:

$ deno
> import ld from "npm:lodash"
> ld.mergeWith("src", "dest")
[String: "src"] { "3": "t" } /// ...huh?
> let result = ld.mergeWith("src", "dest")
> [0,1,2,3].map(x => result[x])
[ "s", "r", "c", "t" ] // !!!!!
> String(result)
"src" // I give up
cscheid commented 1 month ago

More fun:

> ld.mergeWith("foo", ["foo"], mergeArrayCustomizer)
[String: "foo"] // this is a String object, not a singleton array!
> ld.mergeWith(["foo"], "foo", mergeArrayCustomizer)
[ "f", "o", "o" ]
> ld.mergeWith("foo", ["foo"], () => new Error(""))
[String: "foo"]

I think we need to throw away mergeWith, it doesn't look like it's particularly salvageable for our use case.

cscheid commented 1 month ago

@xtimbeau I think I can fix this, but it involves rewriting (literally) the most used function in our code base. So I'm going to delay this to 1.7 when we can properly test it.

cscheid commented 1 month ago

More:

> ld.mergeWith([1], [1, 1], mergeArrayCustomizer)
[ 1, 1 ]
> ld.mergeWith({foo: [1]}, {foo: [1, 1]}, mergeArrayCustomizer)
{ foo: [ 1 ] }

I think we need a function that we can show respects a few properties that we care about. For appropriate definitions of == and null, we want:

In other words, mergeConfigs needs to behave like a monoid over x. (sorry, my academia is showing)

We need == to be structural

Plan

The first fix is to make sure mergeConfigs always passes a dummy object to mergeWith. That probably solves a good number of issues. (EDIT: it also creates new problems..)

We should also consider a schema-directed merge: something like schemaMerge(schema, ...objs), where we can use the schema information to guide the merging behavior.

cderv commented 1 month ago

The code exists so that if an execution engine emits metadata, we can merge it appropriately against the metadata that we send to Pandoc. (Because of reasons described elsewhere, we can't just send the YAML metadata to Pandoc.)

Just as reference as I needed a remonder. This engineMetadata was introduced so that using inline R expression in YAML header would still work under Quarto. This is what is mentioned in the comment from 341cc3b8eb79c6cc53e172dec3b532148db107c9 https://github.com/quarto-dev/quarto-cli/blob/0183439d37e10b713270ae6e901a0ba5d8ddb0ad/src/command/render/pandoc.ts#L983-L988

Example of this

---
title: Test
r-value: '`r 1 + 1`'
format: html
keep-md: true
engine: knitr
---

This value is a metadata computed through knitr inline: {{< meta r-value >}}

Image

This works for any engine with inline expression, but only for metadata that are not processed by Quarto, because they will only be available after computation. This is what isQuartoMetadata() is supposed to be about.

Quite confusing feature at the end, because you need to know / guess on which metadata this will work, and which it won't. And this is what #9548 was really about.