quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.77k stars 308 forks source link

scholarly-metadata Lua filter gives invalid YAML error #552

Closed benmarwick closed 2 years ago

benmarwick commented 2 years ago

I'm really enjoying Quarto and excited to shift from Rmd to Qmd in my writing and teaching. I use these Lua filters frequently: scholarly-metadata.lua and author-info-blocks.lua. When I use these with Qmd I get "Validation of YAML front matter failed".

These work great with Rmd, for example this MWE works as expected:

---
title: "Testing Lua filters with Rmd"
author:
  - Jane Doe:
      institute:
        - fosg
        - fop
  - John Q. Doe:
      institute: fosg
  - Peder Ås:
      institute: fosg
  - Juan Pérez:
      institute:
        - name: Acme Corporation
  - Max Mustermann
institute:
  - fosg:
      name: Formatting Open Science Group
      address: 23 Science Street, Eureka, Mississippi, USA
  - fop: Federation of Planets
output: 
    bookdown::word_document2:
      pandoc_args:
      - --lua-filter=scholarly-metadata.lua
      - --lua-filter=author-info-blocks.lua
      - --lua-filter=pagebreak.lua
---

Example yml is from https://github.com/pandoc/lua-filters/tree/master/scholarly-metadata

Here's the output:

image

But when I try the same filters in a Qmd document, following the docs, I get an error about invalid YAML:

---
title: "Testing Lua filters with Qmd"
author:
  - Jane Doe:
      institute:
        - fosg
        - fop
  - John Q. Doe:
      institute: fosg
  - Peder Ås:
      institute: fosg
  - Juan Pérez:
      institute:
        - name: Acme Corporation
  - Max Mustermann
institute:
  - fosg:
      name: Formatting Open Science Group
      address: 23 Science Street, Eureka, Mississippi, USA
  - fop: Federation of Planets
format: 
  docx
filters: 
  - scholarly-metadata.lua
  - author-info-blocks.lua
---

Example yml is from https://github.com/pandoc/lua-filters/tree/master/scholarly-metadata

Here's the output in the Render tab:

ERROR: Validation of YAML front matter failed.
ERROR: In file author-info-blocks.qmd
(line 17, column 5 through line 19, column 58) Array entry 1 with value fosg:
      name: Formatting Open Science Group
      address: 23 Science Street, Eureka, Mississippi, USA failed to be a string.
16: institute:
17:   - fosg:
        ~~~~~
18:       name: Formatting Open Science Group
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19:       address: 23 Science Street, Eureka, Mississippi, USA
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20:   - fop: Federation of Planets
✖ The value fosg:
      name: Formatting Open Science Group
      address: 23 Science Street, Eureka, Mississippi, USA is object.
ℹ The error happened in location institute:0.

ERROR: In file author-info-blocks.qmd
(line 20, columns 5--30) Array entry 2 with value fop: Federation of Planets failed to be a string.
19:       address: 23 Science Street, Eureka, Mississippi, USA
20:   - fop: Federation of Planets
        ~~~~~~~~~~~~~~~~~~~~~~~~~
21: format: 
✖ The value fop: Federation of Planets is object.
ℹ The error happened in location institute:1.

ERROR: Render failed due to invalid YAML.

Here's my session info:

Quarto version 0.9.80 Pandoc version 2.17.1.1

RStudio 2022.02.1+461 "Prairie Trillium" Release (8aaa5d470dd82d615130dbf663ace5c7992d48e3, 2022-03-17) for macOS Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.10 Chrome/69.0.3497.128 Safari/537.36

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] knitr_1.37.4      magrittr_2.0.2    usethis_2.1.5     devtools_2.4.3   
 [5] pkgload_1.2.4     here_1.0.1        R6_2.5.1          rlang_1.0.2      
 [9] fastmap_1.1.0     tools_4.1.2       pkgbuild_1.3.1    xfun_0.30        
[13] sessioninfo_1.2.2 cli_3.2.0         git2r_0.29.0      withr_2.5.0      
[17] htmltools_0.5.2   ellipsis_0.3.2    remotes_2.4.2     yaml_2.3.5       
[21] rprojroot_2.0.2   digest_0.6.29     lifecycle_1.0.1   bookdown_0.24    
[25] crayon_1.5.0      brio_1.1.3        processx_3.5.2    purrr_0.3.4      
[29] callr_3.7.0       fs_1.5.2          ps_1.6.0          testthat_3.1.2   
[33] glue_1.6.2        memoise_2.0.1     cachem_1.0.6      evaluate_0.15    
[37] rmarkdown_2.12    compiler_4.1.2    desc_1.4.1        prettyunits_1.1.1
cscheid commented 2 years ago

Hi, and thanks for using Quarto!

This is our fault. It happens because the field institute actually has a schema associated with it because of requirements of the html and beamer classes. Those formats expect institute to be a string.

As we describe in #369, our validation infrastructure currently doesn't know to ignore errors stemming from a format that's not being used by the current document.

I added a workaround for the validator not to complain in this case in 89292de3b, and @dragonstyle is working on the remainder of the fix right at this moment.

benmarwick commented 2 years ago

Brilliant, thanks so much for your quick reply, that workaround has those filters working as expected, fantastic!

krusse commented 2 years ago

When I try to render @benmarwick's second code block, I get the following error:

Error running filter ../../pandoc/filters/scholarly-metadata.lua:
../../pandoc/filters/scholarly-metadata.lua:95: not a named object: List: 0x7fe1ab857850
stack traceback:
        ../../pandoc/filters/scholarly-metadata.lua:95: in function 'to_named_object'
        [C]: in function 'pandoc.List.map'
        ../../pandoc/filters/scholarly-metadata.lua:144: in upvalue 'canonicalize'
        ../../pandoc/filters/scholarly-metadata.lua:181: in function <../../pandoc/filters/scholarly-metadata.lua:180>

Using pandoc 2.9.2.1, quarto 0.9.415, vscode 1.67.1 with quarto extension v1.20.1.

Works with Rmd.

Any ideas as to why I cannot get this working?

krusse commented 2 years ago

I tried again with an older Rmd that I converted to qmd.

Here is the output:


pandoc 
  to: latex
  output-file: test.tex
  standalone: true
  pdf-engine: xelatex
  variables:
    graphics: true
    tables: true
  default-image-extension: pdf
  filters:
    - crossref
    - ../../pandoc/filters/author-info-blocks.lua
    - ../../pandoc/filters/scholarly-metadata.lua
    - citeproc

metadata
  documentclass: scrartcl
  classoption:
    - DIV=11
    - numbers=noendperiod
  papersize: letter
  header-includes:
    - '\KOMAoption{captions}{tableheading}'
  block-headings: true
  title: test
  author:
    - Jane A. Doe:
      institute:
        - a
        - b
    - Juan Koe:
        institute: c
    - Lisa Soe:
        institute: d
    - Thomas G. Loe:
        institute:
          - a
          - b
  institute:
    - a: This Inc.
    - b: 'Biom, University of Here'
    - c: 'Clin, University of There'
    - d: 'That, LLC'
  fontsize: 11pt
  link-citations: true
  bibliography:
    - ../../../library.bib

Error running filter ../../pandoc/filters/author-info-blocks.lua:
../../pandoc/filters/author-info-blocks.lua:100: bad argument #2 to 'concat' (table expected, got nil)
stack traceback:
        ../../pandoc/filters/author-info-blocks.lua:100: in function <../../pandoc/filters/author-info-blocks.lua:95>
        [C]: in function 'pandoc.List.map'
        ../../pandoc/filters/author-info-blocks.lua:94: in upvalue 'create_affiliations_blocks'
        ../../pandoc/filters/author-info-blocks.lua:160: in function <../../pandoc/filters/author-info-blocks.lua:153>

edit: fixed institutes.

dragonstyle commented 2 years ago

I think the issue is likely that you have authors referencing institute c and d but that those institutes are not defined in the metadata (not 100% sure, but from looking at the line of code and the yaml).

    - Juan Koe:
        institute: c
    - Lisa Soe:
        institute: d

...

  institute:
    - a: This Inc.
    - b: 'Biom, University of Here'
    - clin: 'Clin, University of There'
    - lik: 'That, LLC'
krusse commented 2 years ago

I'm sorry. That was me incompletely trying to anonymize the info. In reality, all institutes are represented correctly. This file works fine when knitting the Rmd.

dragonstyle commented 2 years ago

When I render this document:


---
title: "Testing Lua filters with Qmd"
author:
  - Jane A. Doe:
      institute:
        - a
        - b
  - Juan Koe:
      institute: c
  - Lisa Soe:
      institute: d
  - Thomas G. Loe:
      institute:
        - a
        - b
institute:
  - a: This Inc.
  - b: 'Biom, University of Here'
  - c: 'Clin, University of There'
  - d: 'That, LLC'
format: pdf
filters: 
  - scholarly-metadata.lua
  - author-info-blocks.lua
---

Using the latest version of the filter at https://github.com/pandoc/lua-filters/blob/master/scholarly-metadata/scholarly-metadata.lua I get a document rendered without issue. Sample document w/filters attached here:

sample.zip

krusse commented 2 years ago

Thank you very much. That works for me as well.

I was using the latest release of lua filters from Nov 5, 2021, not the latest versions available.

lakonis commented 1 year ago

Hello @cscheid , I refer to your previous comment:

As we describe in https://github.com/quarto-dev/quarto-cli/issues/369, our validation infrastructure currently doesn't know to ignore errors stemming from a format that's not being used by the current document.

I added a workaround for the validator not to complain in this case in https://github.com/quarto-dev/quarto-cli/commit/89292de3b59e15e7c6c779ca1d577a969e29bb80, and @dragonstyle is working on the remainder of the fix right at this moment.

Would it be possible to generalize this workaround on other metadatas. Presently my key abstract is not a string but a more complex object:

abstract:
  - lang: fr
    text_f: >-
      Lorem ipsum in french.
  - lang: en
    text_f: >-
      Lorem ipsum in english.

I get the same error :

Field "abstract" has value

- lang: fr
  text_f: >-
...

The value must instead be a string.

For your information, I have created a adapted journal template for this yaml schema, and I am now trying to make it work. Thanks for your help !

cscheid commented 1 year ago

Would it be possible to generalize this workaround on other metadatas.

Unfortunately that's a very hard thing to do in general. With that said, if you must disable the validation entirely, add

validate-yaml: false

to your metadata. Do note that this completely stops validation, which means errors won't be flagged at YAML loading time and might cause downstream problems. We really recommend that you use a different key instead.

dragonstyle commented 1 year ago

One suggestion I have - could you not use the keyword abstract and instead use some other key name (e.g. abstract-localized or something) and then normalize the correct value into the abstract metadata in a LUA filter based upon the active language when rendering?

lakonis commented 1 year ago

Thank you for quick answers ! Yes indeed, lua filter might work, however:

  1. I was hoping to deal with my own schema through pandoc templates as I usually do
  2. I need to get into lua filters... :/ Maybe you have a lua code sample/bootstrap I could use to duplicate/replace/rename a key ?