manubot / rootstock

Clone me to create your Manubot manuscript
https://manubot.github.io/rootstock/
Other
452 stars 175 forks source link

Exporting manubot manuscript to Rmd #381

Open taylorreiter opened 3 years ago

taylorreiter commented 3 years ago

Hello! I recently used manubot to draft a collaborative document, it was a wonderful experience -- thank you for generating such a great tool! I now find need to export the manuscript to Rmarkdown. Using the output manuscript.md, I find with very few changes that everything knits appropriately and generates a rendered pdf of the document. However, I could not get citations to render properly. When I knit using bibliography: references.json, I get output like:

pandoc-citeproc: reference doi:10.1038/s41587-020-0439-x not found
pandoc-citeproc: reference doi:10.1371/journal.pcbi.1005755 not found

It seems like there is enough information output by manubot between markdown.md, references.json, and citations.tsv that citations/references in Rmarkdown might work relatively easily, but I couldn't figure out how to make this work. My current plan is to replace all of the ~125 citations by hand with bibtex references and generate a new bibliography, but I would love to avoid this if at all possible!

taylorreiter commented 3 years ago

Adding more information to include examples!

The RMarkdown might look something like this:

---
bibliography: references.json
title: My document
output: pdf_document
---

## Introduction

Historically, this has led to a thing that we all observed[@doi:10.1038/s41587-020-0439-x].

The references.json looks like this:

{
    "type": "article-journal",
    "id": "wq4G2CfQ",
    "author": [
      {
        "family": "Ewels",
        "given": "Philip A."
      },
      {
        "family": "Peltzer",
        "given": "Alexander"
      },
      {
        "family": "Fillinger",
        "given": "Sven"
      },
      {
        "family": "Patel",
        "given": "Harshil"
      },
      {
        "family": "Alneberg",
        "given": "Johannes"
      },
      {
        "family": "Wilm",
        "given": "Andreas"
      },
      {
        "family": "Garcia",
        "given": "Maxime Ulysse"
      },
      {
        "family": "Di Tommaso",
        "given": "Paolo"
      },
      {
        "family": "Nahnsen",
        "given": "Sven"
      }
    ],
    "issued": {
      "date-parts": [
        [
          2020,
          2,
          13
        ]
      ]
    },
    "container-title": "Nature Biotechnology",
    "DOI": "10.1038/s41587-020-0439-x",
    "volume": "38",
    "issue": "3",
    "page": "276-278",
    "publisher": "Springer Science and Business Media LLC",
    "title": "The nf-core framework for community-curated bioinformatics pipelines",
    "URL": "https://doi.org/ggk3qh",
    "PMID": "32055031",
    "note": "This CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: doi:10.1038/s41587-020-0439-x"
  }

And the citations.tsv looks like this:

input_id dealiased_id standard_id short_id
doi:10.1038/s41587-020-0439-x doi:10.1038/s41587-020-0439-x doi:10.1038/s41587-020-0439-x wq4G2CfQ
taylorreiter commented 3 years ago

Andddd in posting this update, I realized that replacing

---
bibliography: references.json
title: My document
output: pdf_document
---

## Introduction

Historically, this has led to a thing that we all observed[@doi:10.1038/s41587-020-0439-x].

with

---
bibliography: references.json
title: My document
output: pdf_document
---

## Introduction

Historically, this has led to a thing that we all observed[@wq4G2CfQ].

Allows Rmd/pandoc-citeproc to see the citation appropriately, so I just need to programmatically replace the input_id with the short_id throughout the file!

dhimmel commented 3 years ago

On the output branch, the manuscript.md document should have citations that use short_id. See if using that markdown as the Rmarkdown source works.

I'm guessing the following markdown is what you want:

Update: this is wrong. Let me look into this.

You should probably delete the HTML meta values. And next to that file is the CSL JSON:

Would be nice to see that Manubot output --> Rmarkdown is possible with few manual steps. Thanks for trying this out and letting other users know about any obstacles!

dhimmel commented 3 years ago

My above comment is wrong and applied to an old version of Manubot.

Now that pandoc-manubot-cite is its own pandoc filter I see two options.

Calling the pandoc-manubot-cite filter from Rmarkdown

As per the docs at https://rmarkdown.rstudio.com/docs/articles/lua-filters.html, you might be able to add something like the following in your Rmarkdown document

---
output:
  html_document:
    pandoc_args:
    - --filter=pandoc-manubot-cite
    - --filter=pandoc-citeproc
---

The pandoc options used by Manubot are specified at https://github.com/manubot/rootstock/blob/8b9b5ced2c7c963bf3ea5afb8f31f9a4a54ab697/build/pandoc/defaults/common.yaml

Running pandoc to export to markdown

Here is the command Manubot runs to export to HTML.

I think what you want is to export to markdown, so possibly:

pandoc --verbose \
  --data-dir="$PANDOC_DATA_DIR" \
  --defaults=common.yaml \
  --to=markdown \
  --output=output/manuscript-post-filters.md

Haven't tested this, but the goal is to run the pandoc-manubot-cite filter to process the citations but to write to markdown and not HTML.

I think this option might be better than 1. Haven't tested either, but happy to help debug any issues.

Option 2 should also run the other pandoc filters to number figures, tables, and equations.

dhimmel commented 3 years ago

One thing we might consider is adding an opt-in BUILD_MD option to rootstock, so you could enable this environment variable and get a more portable markdown output. One question would be which markdown to export to: markdown (pandocs markdown), commonmark, or commonmark_x. Perhaps this could be an option.

Update: I opened PR https://github.com/manubot/rootstock/pull/382 that demonstrates running pandoc to export to markdown. I think this should get you what you need (running the filters for citations and figure/table/equation numbering)

dhimmel commented 3 years ago

Okay, I think the following code in https://github.com/manubot/rootstock/pull/382 will create a markdown file you can use with RMarkdown:

https://github.com/manubot/rootstock/blob/6645e8b722c16e9c1d78f4f35ee609114eba9228/build/build.sh#L38-L43

Let us know how that works.