pandoc / lua-filters

A collection of lua filters for pandoc
MIT License
600 stars 165 forks source link

multiple-bibliographies.lua: PandocFilterError "pandoc" "Filter returned error status 65" #254

Closed ickc closed 1 year ago

ickc commented 1 year ago

MWE:

In bug.yml:

references:
- id: miyakawa_no_2020
  abstract: |-
    Abstract
                A reproducibility crisis is a situation where many scientific studies cannot be reproduced. Inappropriate practices of science, such as HARKing, p-hacking, and selective reporting of positive results, have been suggested as causes of irreproducibility. In this editorial, I propose that a lack of raw data or data fabrication is another possible cause of irreproducibility.

                  As an Editor-in-Chief of
                  Molecular Brain
                  , I have handled 180 manuscripts since early 2017 and have made 41 editorial decisions categorized as “Revise before review,” requesting that the authors provide raw data. Surprisingly, among those 41 manuscripts, 21 were withdrawn without providing raw data, indicating that requiring raw data drove away more than half of the manuscripts. I rejected 19 out of the remaining 20 manuscripts because of insufficient raw data. Thus, more than 97% of the 41 manuscripts did not present the raw data supporting their results when requested by an editor, suggesting a possibility that the raw data did not exist from the beginning, at least in some portions of these cases.

                Considering that any scientific study should be based on raw data, and that data storage space should no longer be a challenge, journals, in principle, should try to have their authors publicize raw data in a public database or journal site upon the publication of the paper to increase reproducibility of the published results and to increase public trust in science.
  accessed:
    - year: 2022
      month: 11
      day: 26
  author:
    - family: Miyakawa
      given: Tsuyoshi
  citation-key: miyakawa_no_2020
  container-title: Molecular Brain
  container-title-short: Mol Brain
  DOI: 10.1186/s13041-020-0552-2
  ISSN: 1756-6606
  issue: '1'
  issued:
    - year: 2020
      month: 12
  language: en
  page: 24, s13041-020-0552-2
  source: DOI.org (Crossref)
  title: >-
    No raw data, no science: another possible source of the reproducibility
    crisis
  title-short: No raw data, no science
  type: article-journal
  URL: https://molecularbrain.biomedcentral.com/articles/10.1186/s13041-020-0552-2
  volume: '13'

In bug.md,

---
bibliography_main: bug.yml
nocite: |
  @*
...

# References

::: {#refs_main}
:::

resulted in

❯ pandoc --lua-filter=multiple-bibliographies.lua bug.md
Error at "bug.yml_chunk" (line 8, column 5):
unexpected end of input
Error running filter ~/.local/share/pandoc/filters/multiple-bibliographies.lua:
PandocFilterError "pandoc" "Filter returned error status 65"
stack traceback:
        .../.local/share/pandoc/filters/multiple-bibliographies.lua:50: in upvalue 'run_citeproc'
        .../.local/share/pandoc/filters/multiple-bibliographies.lua:82: in function <.../.local/share/pandoc/filters/multiple-bibliographies.lua:68>

Edit: note that the error comes from the abstract, if the abstract is removed, there will be no error.

tarleb commented 1 year ago

It seems that this can be reproduced without this filter by setting

---
bibliography: bug.yml
nocite: |
  @*
...

or by running

pandoc --metadata-file=bug.yml <<< 'test'

The problem seems to be the hypen-minus after the block marker: replacing |- with | resolves this. See https://github.com/jgm/pandoc/issues/8449 for a related issue and some more details.

I'm trying to figure out if this is a pandoc or YAML parser bug; valid YAML should never produce an error.

jgm commented 1 year ago

Here's an even smaller test:

pandoc -s -t native
---
abstract: |-
   a

       b
...
^D
Pandoc
  Meta { unMeta = fromList [] }
  [ HorizontalRule
  , Para
      [ Str "abstract:" , Space , Str "|-" , SoftBreak , Str "a" ]
  , CodeBlock ( "" , [] , [] ) "   b"
  , Para [ Str "\8230" ]
  ]

Note that this is not recognized as metadata. Changing |- to | fixes it. Removing the indentation from b fixes it. My guess is that pandoc can't convert the blocks (with the code block) into inline content as required by |-. Still, it can probably be made to do something more graceful here.

And when we have a valid YAML block that can't be converted by pandoc, we should fail with an error rather than treating it as something else. (or at least a warning)

jgm commented 1 year ago

I'm moving this to pandoc, because it's really a pandoc issue, it seems. (I guess I can't transfer it across orgs.)

jgm commented 1 year ago

I submitted this issue to pandoc: https://github.com/jgm/pandoc/issues/8465