manubot / rootstock

Clone me to create your Manubot manuscript
https://manubot.github.io/rootstock/
Other
454 stars 179 forks source link

Check out Pandoc Scholar #32

Open dhimmel opened 7 years ago

dhimmel commented 7 years ago

Described in Formatting Open Science: agilely creating multiple document formats for academic manuscripts with Pandoc Scholar:

In this article we demonstrate the feasibility of writing scientific manuscripts in plain markdown (MD) text files, which can be easily converted into common publication formats, such as PDF, HTML or EPUB, using Pandoc. The simple syntax of Markdown assures the long-term readability of raw files and the development of software and workflows. We show the implementation of typical elements of scientific manuscripts—formulas, tables, code blocks and citations—and present tools for editing, collaborative writing and version control. We give an example on how to prepare a manuscript with distinct output formats, a DOCX file for submission to a journal, and a LATEX/PDF version for deposition as a PeerJ preprint. Further, we implemented new features for supporting ‘semantic web’ applications, such as the ‘journal article tag suite’—JATS, and the ‘citation typing ontology’—CiTO standard.

The GitHub repo for this project is pandoc-scholar/pandoc-scholar. Created by @tarleb.

Let's see if there's anything from Pandoc Scholar we should incorporate here or learn from.

dhimmel commented 7 years ago

Also worth checking out gh-publisher -- use case at jakevdp/multiband_LS.

tarleb commented 7 years ago

Hi @dhimmel, thank you for checking out pandoc-scholar! Your project looks interesting. Is it going to be a long-term effort? If so, I'd advice not to build it on the current pandoc-scholar, but to use the upcoming pandoc version 2 (inoffical nightly builds). The reason for this is that we are going to integrate lua deeper into pandoc and make more internals accessible to lua programs. Pandoc-scholar includes some hacks and a complex Makefile based build system, which mostly won't be necessary with the new pandoc version. As an additional advantage, it will become possible to use pandoc as a lua interpreter. So instead of requiring users to have bash and python installed (which is a pain for windows users), it will be possible to use just pandoc and its integrated lua interpreter. Please let me know if that's an option for you and I'll happily help with the pandoc side of things.

dhimmel commented 7 years ago

Is it going to be a long-term effort?

Yes.

use the upcoming pandoc version 2

When do you think release will be? Would you recommend using the nightly builds in production?

I'd advice not to build it on the current pandoc-scholar

We're not. Just using pandoc-scholar as a reference. This repository does a few things that are beyond the scope of pandoc-scholar:

  1. Automatic generation of reference metadata as JSON CSL Items.
  2. Use of continuous integration to rebuild and deploy the manuscript upon any changes
  3. A templating framework that enables dynamically inserting data (in progress)
  4. Timestamping the manuscript using the bitcoin blockchain during deployments

instead of requiring users to have bash and python installed (which is a pain for windows users), it will be possible to use just pandoc and its integrated lua interpreter

I'm not sure this is the way we want to go. First, most of the project developers are familiar with Python but not Lua. Also all of our current infrastructure (see items above) is written in Python. We use conda to manage the environment, so we don't anticipate major OS compatibility issues... but you make a good point that our use of shell scripts will likely cause some issues with windows.

I'll happily help with the pandoc side of things

We're happy to modernize if it fits within the project goals. Based on the above discussion, what do you recommend? It may also help to see the system in use at greenelab/deep-review or greenelab/scihub-manuscript.

agitter commented 7 years ago

We use conda to manage the environment, so we don't anticipate major OS compatibility issues...

Note that even with conda, the current build process only works in Linux due to wkhtmltopdf (see greenelab/deep-review#545). However, because that is all done with continuous integration I don't think that is a major limitation.

dhimmel commented 7 years ago

Note that even with conda, the current build process only works in Linux due to wkhtmltopdf

@agitter we're getting wkhtmltopdf from the bioconda channel. We could always submit a PR to add windows and OS X builds, so this is just a temporary limitation.

tarleb commented 7 years ago

The way you describe it, I agree with you and think that you're making the right technological choices. I guess I misunderstood some details. I was basing pandoc-scholar on python at first, but had to switch due to our portability requirements. Since that's a non-issue for you, python is an excellent choice IMHO. Personally, I'd be using the current pandoc 1.19.2 unless I required features not present in that version. The command line interface won't be changing much, and there is no release timeline for pandoc 2 yet.

Off-topic side note: you might be able to skip the sed command removing the authors and date h2 by just specifying author-meta and date-meta in the yaml file.

tarleb commented 7 years ago

You might also be interested in panflute, an excellent library allowing simple modifications of the pandoc document AST.

agitter commented 7 years ago

Texture may also be relevant. The repository is https://github.com/substance/texture.

dhimmel commented 7 years ago

Texture may also be relevant

I played around with the demo editor, which was slick although some features haven't been fully implemented yet. One note from this document:

At this initial stage, Texture is being developed to be used by a production team seeking to take the author’s final version of a manuscript and produce production quality JATS for publishing purposes.

Therefore, one route where manubot could work with Texture is if we exported JATS XML. Then a journal may be able to use Texture to refine the manubot produced manuscript. Or we potentially could use the article viewer (Lens Viewer) to display our manuscripts.

In the meantime, I don't think there is a ton of overlap between our project and Texture.

agitter commented 5 years ago

Therefore, one route where manubot could work with Texture is if we exported JATS XML.

A recent eLife Labs post provides some updates. One relevant part:

We will endeavour to accept submissions of reproducible manuscripts in the form of DAR files by the end of 2019.

DAR files are apparently based on JATS. This is not immediately applicable to Manubot but is worth monitoring. eLife will be a leading journal when it comes to accepting submissions in newer formats.

dhimmel commented 5 years ago

My understanding is that DAR stores the manuscript as JATS, while allowing for the inclusion of other assets like figures, data, and code. For Manubot manuscripts, creating a DAR archive with a JATS manuscript and figures would probably be sufficient. Something to keep in mind when we resume work on https://github.com/manubot/rootstock/pull/82.

I am less convinced that all data and code should be bundled with manuscripts. I think this breaks down with complex studies whose code and data spans many repositories. Therefore, I think it makes sense to initially focus on creating bare-bone DARs that would allow lossless submission of manuscripts to eLife (i.e. no manual formatting or styling steps required).

agitter commented 5 years ago

I am less convinced that all data and code should be bundled with manuscripts. I think this breaks down with complex studies whose code and data spans many repositories. Therefore, I think it makes sense to initially focus on creating bare-bone DARs that would allow lossless submission of manuscripts to eLife

This was my thinking as well

jcolomb commented 4 years ago

Let's see if there's anything from Pandoc Scholar we should incorporate here or learn from.

Maybe the integration with jatsxml they are working on ;)

dhimmel commented 4 years ago

Maybe the integration with jatsxml they are working on ;)

I updated https://github.com/manubot/rootstock/pull/82 and will see if Pandoc produces reasonable JATS from our markdown. Are you specifically referring to the jats-cite.lua and jats-fixes.lua filters as well as the pandoc-scholar.jats template? These could be useful (especially the filters). Possibly they will make it into core pandoc... and they're using pandoc-scholar to prototype.

jcolomb commented 4 years ago

I was refering to a twitter discussion with @tarleb, seems they are working on that ;)

tarleb commented 4 years ago

Yes, we might merge some of this back into pandoc/pandoc-citeproc, but it might take a while. The filter is a workaround for some shortcomings of the current implementation, but a proper fix would require bigger changes in pandoc-citeproc.

I'll happily keep you updated on our progress.