quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.89k stars 321 forks source link

Quarto does not allow to add filters after citeproc processing, leading to uncorrect results #9726

Closed TomBener closed 4 months ago

TomBener commented 5 months ago

Bug description

Hello Quarto team,

I am experiencing an issue where a specific Lua filter is not being applied when rendering a document in Quarto.

When I run the command quarto render --to html, the Lua filter specified in my _quarto.yml file is not applied. In the HTML output, it generated the reference as:

<div id="ref-han2020" class="csl-entry" role="listitem">
韩旭东, 李德阳, 王若男, et al., 2020. 盈余分配制度对合作社经营绩效影响的实证分析:基于新制度经济学视角[J]. 中国农村经济(4): 56–77.
</div>

However, when I manually specify the Lua filter in the command line with quarto render --to html -L _extensions/filters/localize-cnbib/localize-cnbib.lua, it works as expected (et al. was replaced with ):

<div id="ref-han2020" class="csl-entry" role="listitem">
韩旭东, 李德阳, 王若男, 等, 2020. 盈余分配制度对合作社经营绩效影响的实证分析:基于新制度经济学视角[J]. 中国农村经济(4): 56–77.
</div>

Reproduction Steps

To demonstrate the problem, I have created a GitHub repo to reproduce the issue.

Environment

Here is the output of quarto check:

$ quarto check

Quarto 1.5.37
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.2.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.5.37
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: v2024.05
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /Users/username/Library/TinyTeX/bin/universal-darwin
      Version: 2024

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.10.10
      Path: /Users/username/.pyenv/versions/3.10.10/bin/python3
      Jupyter: 5.7.2
      Kernels: python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
      Version: 4.4.0
      Path: /opt/homebrew/Cellar/r/4.4.0_1/lib/R
      LibPaths:
        - /Users/username/.R/packages
        - /opt/homebrew/lib/R/4.4/site-library
        - /opt/homebrew/Cellar/r/4.4.0_1/lib/R/library
      knitr: 1.46
      rmarkdown: 2.26

[✓] Checking Knitr engine render......OK
cscheid commented 5 months ago

I don't believe this is a bug - filters have to be specified per document, as in the example here: https://quarto.org/docs/extensions/filters.html#filter-extensions

TomBener commented 5 months ago

filters have to be specified per document

Do you mean that the filter should be specified in index.qmd rather than _quarto.yml? But other Lua filters specified in _quarto.yml work as expected, except for the Lua filter above.

cscheid commented 5 months ago

Can you provide an example of one that works as you expected?

TomBener commented 5 months ago

I have added the Lua filter abstract-section in the repo, and it works as expected.

cscheid commented 5 months ago

Ok. First, a correction on my side. The filter is getting called in the pipeline, and I didn't realize this actually worked. This is generally speaking a bad idea; multiple filters usually require coordination and specifying them in different places (say, _quarto.yml instead of the documents) means you're not going to be able to control the filter execution order if you ever need to add a filter to a specific document.

But, in any case, try this replacement on your own code:

function Pandoc(doc)
    print(pandoc.write(doc, "native"))
end

return {
    {
        Pandoc = Pandoc,
        Cite = process_cite,
        Link = process_cite,
        Div = Div
    }
}

and you'll see the printout:

quarto render
pandoc
  to: html
  output-file: index.html
  standalone: true
  section-divs: true
  html-math-method: mathjax
  wrap: none
  default-image-extension: png

metadata
  document-css: false
  link-citations: true
  date-format: long
  lang: en
  title: Lua Filter Test
  bibliography:
    - bib.bib
  csl: gb-author-date.csl

[ Header 1 ( "abstract" , [] , [] ) [ Str "Abstract" ]
, Para
    [ Str "Place"
    , Space
    , Str "abstract"
    , Space
    , Str "here."
    ]
, Para
    [ Str "Multiple"
    , Space
    , Str "paragraphs"
    , Space
    , Str "are"
    , Space
    , Str "possible."
    ]
, Header 1 ( "test" , [] , [] ) [ Str "Test" ]
, Para
    [ Str "Quarto"
    , Space
    , Str "enables"
    , Space
    , Str "you"
    , Space
    , Str "to"
    , Space
    , Str "weave"
    , Space
    , Str "together"
    , Space
    , Str "content"
    , Space
    , Str "and"
    , Space
    , Str "executable"
    , Space
    , Str "code"
    , Space
    , Str "into"
    , Space
    , Str "a"
    , Space
    , Str "finished"
    , Space
    , Str "document."
    , Space
    , Str "To"
    , Space
    , Str "learn"
    , Space
    , Str "more"
    , Space
    , Str "about"
    , Space
    , Str "Quarto"
    , Space
    , Str "see"
    , Space
    , Link
        ( "" , [ "uri" ] , [] )
        [ Str "https://quarto.org" ]
        ( "https://quarto.org" , "" )
    , Str "."
    ]
, Para
    [ Str "Testing"
    , Space
    , Str "citations"
    , Space
    , Cite
        [ Citation
            { citationId = "knuth84"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = AuthorInText
            , citationNoteNum = 1
            , citationHash = 0
            }
        ]
        [ Str "@knuth84" ]
    , Space
    , Str "and"
    , Space
    , Cite
        [ Citation
            { citationId = "han2020"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = AuthorInText
            , citationNoteNum = 2
            , citationHash = 0
            }
        ]
        [ Str "@han2020" ]
    , Space
    , Str "for"
    , Space
    , Str "using"
    , Space
    , Str "the"
    , Space
    , Str "Lua"
    , Space
    , Str "filter."
    ]
, Div
    ( "3ade8a4a-fb1d-4a6c-8409-ac45482d5fc9"
    , [ "hidden" ]
    , []
    )
    []
]
pandoc
  to: docx
  output-file: index.docx
  default-image-extension: png

metadata
  title: Lua Filter Test
  bibliography:
    - bib.bib
  csl: gb-author-date.csl

[ Header 1 ( "abstract" , [] , [] ) [ Str "Abstract" ]
, Para
    [ Str "Place"
    , Space
    , Str "abstract"
    , Space
    , Str "here."
    ]
, Para
    [ Str "Multiple"
    , Space
    , Str "paragraphs"
    , Space
    , Str "are"
    , Space
    , Str "possible."
    ]
, Header 1 ( "test" , [] , [] ) [ Str "Test" ]
, Para
    [ Str "Quarto"
    , Space
    , Str "enables"
    , Space
    , Str "you"
    , Space
    , Str "to"
    , Space
    , Str "weave"
    , Space
    , Str "together"
    , Space
    , Str "content"
    , Space
    , Str "and"
    , Space
    , Str "executable"
    , Space
    , Str "code"
    , Space
    , Str "into"
    , Space
    , Str "a"
    , Space
    , Str "finished"
    , Space
    , Str "document."
    , Space
    , Str "To"
    , Space
    , Str "learn"
    , Space
    , Str "more"
    , Space
    , Str "about"
    , Space
    , Str "Quarto"
    , Space
    , Str "see"
    , Space
    , Link
        ( "" , [ "uri" ] , [] )
        [ Str "https://quarto.org" ]
        ( "https://quarto.org" , "" )
    , Str "."
    ]
, Para
    [ Str "Testing"
    , Space
    , Str "citations"
    , Space
    , Cite
        [ Citation
            { citationId = "knuth84"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = AuthorInText
            , citationNoteNum = 1
            , citationHash = 0
            }
        ]
        [ Str "@knuth84" ]
    , Space
    , Str "and"
    , Space
    , Cite
        [ Citation
            { citationId = "han2020"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = AuthorInText
            , citationNoteNum = 2
            , citationHash = 0
            }
        ]
        [ Str "@han2020" ]
    , Space
    , Str "for"
    , Space
    , Str "using"
    , Space
    , Str "the"
    , Space
    , Str "Lua"
    , Space
    , Str "filter."
    ]
, Div
    ( "3ade8a4a-fb1d-4a6c-8409-ac45482d5fc9"
    , [ "hidden" ]
    , []
    )
    []
]
Output created: index.html

That means your filter is executing. It's just something in the document that is not working as your code expects.

TomBener commented 5 months ago

Thanks for your diagnosis. But I cannot figure out why the Lua filter is executed from the native output. Further, the html output was not modified by the Lua filter as expected. Could you please help to diagnose why the Lua filter can work on the command line but not by adding in _quarto.yml?

TomBener commented 5 months ago

This is the diff of the command quarto render index.qmd -t native -o test.txt and quarto render index.qmd -t native -o test.txt -L _extensions/filters/localize-cnbib/localize-cnbib.lua:

Pandoc
  Meta
    { unMeta =
        fromList
          [ ( "bibliography"
            , MetaList [ MetaInlines [ Str "bib.bib" ] ]
            )
          , ( "csl" , MetaInlines [ Str "gb-author-date.csl" ] )
          , ( "title"
            , MetaInlines
                [ Str "Lua"
                , Space
                , Str "Filter"
                , Space
                , Str "Test"
                ]
            )
          ]
    }
  [ Para
      [ Str "Testing"
      , Space
      , Str "citations"
      , Space
      , Cite
          [ Citation
              { citationId = "knuth84"
              , citationPrefix = []
              , citationSuffix = []
              , citationMode = AuthorInText
              , citationNoteNum = 1
              , citationHash = 0
              }
          ]
          [ Str "Knuth" , Space , Str "(1984)" ]
      , Space
      , Str "and"
      , Space
      , Cite
          [ Citation
              { citationId = "han2020"
              , citationPrefix = []
              , citationSuffix = []
              , citationMode = AuthorInText
              , citationNoteNum = 2
              , citationHash = 0
              }
          ]
-         [ Str "\38889\26093\19996\160et"
-         , Space
-         , Str "al."
+         [ Str "\38889\26093\19996\160\31561"
          , Space
          , Str "(2020)"
          ]
      , Space
      , Str "for"
      , Space
      , Str "using"
      , Space
      , Str "the"
      , Space
      , Str "Lua"
      , Space
      , Str "filter."
      ]
  , Div
      ( "refs"
      , [ "references" , "csl-bib-body" , "hanging-indent" ]
      , [ ( "entry-spacing" , "0" ) ]
      )
      [ Div
          ( "ref-knuth84" , [ "csl-entry" ] , [] )
          [ Para
              [ Str "Knuth"
              , Space
              , Str "D"
              , Space
              , Str "E,"
              , Space
              , Str "1984."
              , Space
              , Str "Literate"
              , Space
              , Str "Programming[J/OL]."
              , Space
              , Str "Comput."
              , Space
              , Str "J.,"
              , Space
              , Str "27(2):"
              , Space
              , Str "97\8211\&111."
              , Space
              , Link
                  ( "" , [] , [] )
                  [ Str "https://doi.org/10.1093/comjnl/27.2.97" ]
                  ( "https://doi.org/10.1093/comjnl/27.2.97" , "" )
              , Str "."
              , Space
              , Str "DOI:"
              , Space
              , Link
                  ( "" , [] , [] )
                  [ Str "10.1093/comjnl/27.2.97" ]
                  ( "https://doi.org/10.1093/comjnl/27.2.97" , "" )
              , Str "."
              ]
          ]
      , Div
          ( "ref-han2020" , [ "csl-entry" ] , [] )
          [ Para
              [ Str "\38889\26093\19996,"
              , Space
              , Str "\26446\24503\38451,"
              , Space
              , Str "\29579\33509\30007,"
              , Space
-             , Str "et"
-             , Space
-             , Str "al.,"
+             , Str "\31561,"
              , Space
              , Str "2020."
              , Space
              , Str
                  "\30408\20313\20998\37197\21046\24230\23545\21512\20316\31038\32463\33829\32489\25928\24433\21709\30340\23454\35777\20998\26512\65306\22522\20110\26032\21046\24230\32463\27982\23398\35270\35282[J]."
              , Space
              , Str "\20013\22269\20892\26449\32463\27982(4):"
              , Space
              , Str "56\8211\&77."
              ]
          ]
      ]
  ]

From the native output, I don't think the Lua filter was applied, or applied correctly at least.

cscheid commented 5 months ago

From the native output, I don't think the Lua filter was applied, or applied correctly at least.

That isn't consistent with the testing I've done. If you add Pandoc = function(doc) print("here") end and you see the printout (as I did), then the filter is getting called, and the problem is that the structure of the document is not what you're expecting it to be. In that case, you need to fix your filter.

TomBener commented 5 months ago

the problem is that the structure of the document is not what you're expecting it to be. In that case, you need to fix your filter.

Could you please help to see what's the problem with the Lua filter, and how can I fix it? Thanks very much!

cderv commented 5 months ago

@TomBener here are some resources to help you debug this on your end.

Hope this helps

TomBener commented 5 months ago

@cderv Many thanks for your guidance. But I'm confused with the "More precise targeting of AST processing phases" in the document. I cannot fully understand the implications of the three parts: astquarto, and render. Could you please list specific examples in the document to make it easier to understand and use? For example, I have a use-case: Editing LaTeX from Markdown before compiling to PDF, is this possible to use Lua filter in one of the three stages?

cderv commented 5 months ago

ast, quarto, and render. Could you please list specific examples in the document to make it easier to understand and use?

We have no more documentation yet on this. Those are only possible steps where you can apply your filter. By default, the filter will apply at the end IIRC.

Editing LaTeX from Markdown before compiling to PDF, is this possible to use Lua filter in one of the three stages?

Read the doc about Extensions and How Lua filters works. You can do anything from the Parsed Markdown by Pandoc until the writing to output format. So you can do a Lua filter that would catch some object and output Raw LaTeX. but Lua filters cannot be used to post process LaTeX files that would have been generated by Pandoc conversion. Quarto will call LaTeX on it directly.

Hope it helps understand. I did not know how advanced you may be so I mentioned the three parts. You should not consider this for now, and only try to debug your filter using logging at different places in your processing.

TomBener commented 5 months ago

@cderv Thanks!

cscheid commented 4 months ago

I'm going to go ahead and close this one, since I don't think there's anything outstanding.

TomBener commented 4 months ago

Sorry I don't think so. Some Lua filters not applying is indeed a problem I have not resolved. I have updated to test Citation Backlinks Filter by @tarleb, but it didn't work in my Quarto example.

cderv commented 4 months ago

@TomBener as discussed this is specific ordering of how the lua filters should be applied. I gave some hints to look into this, and try debug so that you could come back and provide more details on what is not working.

I have updated to test Citation Backlinks Filter by @tarleb, but it didn't work in my Quarto example.

Let's just deal with that first: The README of this filter clearly state:

The filter doesn't work yet as a Quarto extension.

So you can't expect it to work ! There is even an issue in there about this: https://github.com/tarleb/citation-backlinks/issues/2

The reason this filter is not working is probably the same as yours, if localize-cnbib.lua requires to be run after citeproc has happen.

So more generally, Lua filters that requires to be ran after citeproc does not work yet in Quarto extension.

This is discussed at

with another filter not working

and we are tracking the improvement at

Currently, we call citeproc as part of the default files after all the other filters, and there is no way to apply it after. Quarto has a specific handling of filters by creating a filter chain to mix internal and user filter together the right way in right context. citeproc is applied after this filter chain.

Using quarto render -L as you did works because in this case the filters is independently added to the list of filters to run. It could work for some filters, but would not for other more tied to Lua API for example. Those need to be in the filter chain.

I hope this helps understand. I'll rename this issue to make clear what this is about. And follow #7888 for resolution of this limitation.

cderv commented 4 months ago

Duplicate of #7888

TomBener commented 4 months ago

@cderv Thanks very much! I think your comment is correct and it helps me understand the cause for the issue. All my Lua filters that cannot be used as Quarto extensions are indeed related to citeproc.