Open dhimmel opened 7 years ago
Also worth checking out gh-publisher
-- use case at jakevdp/multiband_LS
.
Hi @dhimmel, thank you for checking out pandoc-scholar! Your project looks interesting. Is it going to be a long-term effort? If so, I'd advice not to build it on the current pandoc-scholar, but to use the upcoming pandoc version 2 (inoffical nightly builds). The reason for this is that we are going to integrate lua deeper into pandoc and make more internals accessible to lua programs. Pandoc-scholar includes some hacks and a complex Makefile based build system, which mostly won't be necessary with the new pandoc version. As an additional advantage, it will become possible to use pandoc as a lua interpreter. So instead of requiring users to have bash and python installed (which is a pain for windows users), it will be possible to use just pandoc and its integrated lua interpreter. Please let me know if that's an option for you and I'll happily help with the pandoc side of things.
Is it going to be a long-term effort?
Yes.
use the upcoming pandoc version 2
When do you think release will be? Would you recommend using the nightly builds in production?
I'd advice not to build it on the current pandoc-scholar
We're not. Just using pandoc-scholar as a reference. This repository does a few things that are beyond the scope of pandoc-scholar:
instead of requiring users to have bash and python installed (which is a pain for windows users), it will be possible to use just pandoc and its integrated lua interpreter
I'm not sure this is the way we want to go. First, most of the project developers are familiar with Python but not Lua. Also all of our current infrastructure (see items above) is written in Python. We use conda to manage the environment, so we don't anticipate major OS compatibility issues... but you make a good point that our use of shell scripts will likely cause some issues with windows.
I'll happily help with the pandoc side of things
We're happy to modernize if it fits within the project goals. Based on the above discussion, what do you recommend? It may also help to see the system in use at greenelab/deep-review
or greenelab/scihub-manuscript
.
We use conda to manage the environment, so we don't anticipate major OS compatibility issues...
Note that even with conda, the current build process only works in Linux due to wkhtmltopdf
(see greenelab/deep-review#545). However, because that is all done with continuous integration I don't think that is a major limitation.
Note that even with conda, the current build process only works in Linux due to wkhtmltopdf
@agitter we're getting wkhtmltopdf
from the bioconda channel. We could always submit a PR to add windows and OS X builds, so this is just a temporary limitation.
The way you describe it, I agree with you and think that you're making the right technological choices. I guess I misunderstood some details. I was basing pandoc-scholar on python at first, but had to switch due to our portability requirements. Since that's a non-issue for you, python is an excellent choice IMHO. Personally, I'd be using the current pandoc 1.19.2 unless I required features not present in that version. The command line interface won't be changing much, and there is no release timeline for pandoc 2 yet.
Off-topic side note: you might be able to skip the sed
command removing the authors and date h2 by just specifying author-meta
and date-meta
in the yaml file.
You might also be interested in panflute, an excellent library allowing simple modifications of the pandoc document AST.
Texture may also be relevant. The repository is https://github.com/substance/texture.
Texture may also be relevant
I played around with the demo editor, which was slick although some features haven't been fully implemented yet. One note from this document:
At this initial stage, Texture is being developed to be used by a production team seeking to take the author’s final version of a manuscript and produce production quality JATS for publishing purposes.
Therefore, one route where manubot could work with Texture is if we exported JATS XML. Then a journal may be able to use Texture to refine the manubot produced manuscript. Or we potentially could use the article viewer (Lens Viewer) to display our manuscripts.
In the meantime, I don't think there is a ton of overlap between our project and Texture.
Therefore, one route where manubot could work with Texture is if we exported JATS XML.
A recent eLife Labs post provides some updates. One relevant part:
We will endeavour to accept submissions of reproducible manuscripts in the form of DAR files by the end of 2019.
DAR files are apparently based on JATS. This is not immediately applicable to Manubot but is worth monitoring. eLife will be a leading journal when it comes to accepting submissions in newer formats.
My understanding is that DAR stores the manuscript as JATS, while allowing for the inclusion of other assets like figures, data, and code. For Manubot manuscripts, creating a DAR archive with a JATS manuscript and figures would probably be sufficient. Something to keep in mind when we resume work on https://github.com/manubot/rootstock/pull/82.
I am less convinced that all data and code should be bundled with manuscripts. I think this breaks down with complex studies whose code and data spans many repositories. Therefore, I think it makes sense to initially focus on creating bare-bone DARs that would allow lossless submission of manuscripts to eLife (i.e. no manual formatting or styling steps required).
I am less convinced that all data and code should be bundled with manuscripts. I think this breaks down with complex studies whose code and data spans many repositories. Therefore, I think it makes sense to initially focus on creating bare-bone DARs that would allow lossless submission of manuscripts to eLife
This was my thinking as well
Let's see if there's anything from Pandoc Scholar we should incorporate here or learn from.
Maybe the integration with jatsxml they are working on ;)
Maybe the integration with jatsxml they are working on ;)
I updated https://github.com/manubot/rootstock/pull/82 and will see if Pandoc produces reasonable JATS from our markdown. Are you specifically referring to the jats-cite.lua
and jats-fixes.lua
filters as well as the pandoc-scholar.jats
template? These could be useful (especially the filters). Possibly they will make it into core pandoc... and they're using pandoc-scholar to prototype.
I was refering to a twitter discussion with @tarleb, seems they are working on that ;)
Yes, we might merge some of this back into pandoc/pandoc-citeproc, but it might take a while. The filter is a workaround for some shortcomings of the current implementation, but a proper fix would require bigger changes in pandoc-citeproc.
I'll happily keep you updated on our progress.
Described in Formatting Open Science: agilely creating multiple document formats for academic manuscripts with Pandoc Scholar:
The GitHub repo for this project is
pandoc-scholar/pandoc-scholar
. Created by @tarleb.Let's see if there's anything from Pandoc Scholar we should incorporate here or learn from.