quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.98k stars 328 forks source link

Including qmd that uses jupyter, yaml top matter gets dumped into render doc sometimes #7157

Open machow opened 1 year ago

machow commented 1 year ago

Bug description

When including a qmd in another document (outer document). If...

Then the yaml for the included qmd gets dumped into the rendered outer document.

Steps to reproduce

run quarto render outer.qmd --to gfm with the files below

outer.qmd

---
---

```{=markdown}
[![CI](https://github.com/machow/quartodoc/actions/workflows/ci.yml/badge.svg)](https://github.com/machow/quartodoc/actions/workflows/ci.yml)

{{< include inner.qmd >}}


**inner.qmd**

title: The inner title some_yaml_option: 1 jupyter: kernelspec: python3

1 + 1

**output**

kernelspec: python3

CI


title: The inner title some_yaml_option: 1 jupyter:

1 + 1
2

### Expected behavior

yaml topmatter not included in output

### Actual behavior

yaml topmatter in output (see example for output)

### Your environment

Mac OS

### Quarto check output

Quarto 1.4.398 [✓] Checking versions of quarto binary dependencies... Pandoc version 3.1.8: OK Dart Sass version 1.55.0: OK Deno version 1.33.4: OK [✓] Checking versions of quarto dependencies......OK [✓] Checking Quarto installation......OK Version: 1.4.398 Path: /Applications/quarto/bin

[✓] Checking tools....................OK TinyTeX: (not installed) Chromium: (not installed)

[✓] Checking LaTeX....................OK Tex: (not detected)

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK Version: 3.9.5 Path: /Users/machow/.pyenv/versions/3.9.5/bin/python3 Jupyter: 5.3.1 Kernels: ..SNIPPED..

[✓] Checking Jupyter engine render....OK

(|) Checking R installation...........R scripting front-end version 4.1.2 (2021-11-01) [✓] Checking R installation...........(None)

  Unable to locate an installed version of R.
  Install R from https://cloud.r-project.org/
mcanouil commented 1 year ago

This is not a bug I believe and even an intended behaviour.

As stated in the documentation "Include shortcodes are equivalent to copying and pasting the text from the included file into the main file.", see https://quarto.org/docs/authoring/includes.html.

cscheid commented 1 year ago

As stated in the documentation "Include shortcodes are equivalent to copying and pasting the text from the included file into the main file.", see https://quarto.org/docs/authoring/includes.html.

Yes, but the resulting markdown is garbled.

mcanouil commented 1 year ago

Indeed, I tested and realised that some remained (thus the reopening^^)

machow commented 1 year ago

Thanks for looking at this! I think another surprising piece in the original example is that the behavior seems to vary, depending on whether you set title on the outer doc.

Double checking the include behavior

As stated in the documentation "Include shortcodes are equivalent to copying and pasting the text from the included file into the main file.", see https://quarto.org/docs/authoring/includes.html.

Just to be sure I understand how this behavior plays out when rendering, is there a good way to understand when the inner documents top-matter gets used by quarto when rendering? For example...

outer2.qmd

---
title: The outer title
---

{{< include inner2.qmd >}}

inner2.qmd

---
title: The inner title
some_yaml_option: 1
jupyter:
  kernelspec: python3
---

Is the idea the when I render outer2.qmd, that quarto merges the two top-matters it sees in this all-the-includes-pasted-in final document? E.g.

---
title: The outer title
---

---
title: The inner title
some_yaml_option: 1
jupyter:
  kernelspec: python3
---

So in this case, the final title after merging the top-matter things is "The inner title"?

dragonstyle commented 1 year ago

As a rule, we end up sort of mashing the YAML blocks together (e.g. just concatenate them) which replicates Pandoc's behavior (it will just use the last key to appear, so if title appears twice, pandoc just uses the later). This gets messy pretty quickly currently, however, as we do work outside of the core Pandoc pipeline (for example processing the date field) which will only work against the front matter itself. We really only even allow multiple YAML blocks as a white to support inlining citation data (you can inline bibliography entries as YAML blocks).

As a result, I think the current state of things is that the behavior with multiple YAML blocks isn't currently well defined. There is work to be done to rationalize that and perhaps error, perhaps just get very consistent. But definitely for now I'd say that trying to use multiple blocks in this way is likely to end badly thanks to our inconsistency in this regard...

I don't think that the includes are really directly related- just making simple documents that include multiple blocks exhibit this sort of confusion:

---
title: Test
author: Charles Teague
date: today
format: gfm
---

---
title: What up
author: Jim
foo: bar
---

## Hello 

There

Is this something you just ran across or is there something you're trying to accomplish that we should consider targeting? I think the larger rationalizing of all this is likely a bigger thing that I wouldn't prioritize trying to address right now without some pressing need...

machow commented 1 year ago

I think I have enough information now on quarto's behavior to use includes to solve my problems, so feel free to close this issue if it will be superseded by bigger includes design stuff.

Carlos mentioned that the most sane way to use includes in the original example is to create a document without any yaml topmatter, and then include that in both of the other documents (so both inner and outer just include the third, yaml-matterless doc). This seems like a good way to avoid worrying about top-matter merging etc..!

I think a challenging part of this issue right now, is that because the include merging top-matter worked in simple circumstances, I leaned into it when I should have avoided it 😅

cscheid commented 1 year ago

I think a challenging part of this issue right now, is that because the include merging top-matter worked in simple circumstances, I leaned into it when I should have avoided it 😅

It's definitely confusing. I don't super love Pandoc's default stance of "never emit parse errors or warnings". We have a long-term plan for linters and static analyzers, and "repeated front matter with conflicting keys" would be a prime candidate for a use-case.