Open slochower opened 5 years ago
Is it useful (and feasible) to describe the common customizations one might want for writing a typical journal or grant?
Regarding complying with specific submission formats, I we should consider focusing on the DOCX export rather than the HTML styling. My reasoning is:
Based on these reasons, I suggest we explore DOCX output with Pandoc's --reference-doc
option:
Use the specified file as a style reference in producing a docx or ODT file. For best results, the reference docx should be a modified version of a docx file produced using pandoc. The contents of the reference docx are ignored, but its stylesheets and document properties (including margins, page size, header, and footer) are used in the new docx.
Thus --reference-doc
should be able to deal with font, sizing, and other formatting stipulations. Perhaps, we can find reference-docs for existing journals somewhere, or potentially create a catalog of community contributions.
Now there are also cases where individuals want a different HTML/PDF style. I believe CSS provides some customization and hopefully users can modify default.html
to make minor updates as needed. We also can potentially provide some turnkey configuration for the CSS. However, I am skeptical whether it will ever be easy enough for most users. On the other hand, editing a DOCX file does seem like it will be more accessible.
Finally, our longterm goal would be submission using JATS XML, but this is years off as no journals I'm aware of actually accept JATS submission at the moment.
, I we should consider focusing on the DOCX export rather than the HTML styling
Okay, fair points.
Thus --reference-doc should be able to deal with font, sizing, and other formatting stipulations. Perhaps, we can find reference-docs for existing journals somewhere, or potentially create a catalog of community contributions.
I've used this a bit, but it still leaves some things to be desired (e.g., keeping captions with Figures). It may be the best we can do, but it will be good to put up some guidance.
Do you have things to add @agitter ?
I don't have much to add other than to say that addressing these practical formatting concerns will be important if we want to scale beyond to be a writing platform that researchers can use for their everyday needs. DOCX export does seem like our best option for now, but it is a time sink if the formatting has to be manually fixed many times when writing.
Some of the problems and solutions may become more apparent as we accrue examples of working with Manubot for different uses cases. Your grant writing example was enlightening. So far with deep review and meta review the journals did not impose strict formatting requirements so I have not had to deal with them much.
We could reach out to the few other users who have submitted Manubot manuscripts to journals to learn about their strategies.
We could reach out to the few other users who have submitted Manubot manuscripts to journals to learn about their strategies.
Good idea! @dhimmel, I suspect you have the most knowledge of who is using Manubot. I wonder if we should make it easy for users to add a reference DOCX here via PR or we should create a templates
repository under the main Manubot organization repo.
I wonder if we should make it easy for users to add a reference DOCX here via PR or we should create a templates repository under the main Manubot organization repo
Another option besides a new repo to host reference-docs could be a directory in https://github.com/manubot/resources. Let's check whether Pandoc supports --reference-doc=URL
like it does for --csl=URL
.
FWIW, I intend to eventually write an exhaustive docs (that goes into way more detail than could reasonably fit in a readme.md
). We'll definitely cover what the plugins do, how to customize them, what the themes do, how to customize them, and more.
A note for future reference: we should look at projects like pubcss
to see how much styling is possible with CSS.
Conversation began in #235
To better understand where this customization piece hooks into the conversion process I have a few questions:
Is there an existing document on the passes/sweeps that Manubot makes to transform from source to HTML/PDF/DOCX?
Almost all of the conversion process is done by the build.sh
script, so that is the best place to look. The description in the Manubot software paper is a bit too general probably to be useful here. Prior to Pandoc, processing is done by the manubot process
command:
How much transformation is driven by Manubot and how much is Pandoc?
Most of the transformation is driven by Pandoc. However, Manubot does a lot of the citation/bibliographic processing (the whole cite-by-persistent-identifier stuff). In addition, we use customized Pandoc commands to, for example, insert custom HTML / CSS / JS into the manuscript.html
output.
As a general guideline, we'd like to delegate as much as possible to Pandoc. Ideally, we can avoid duplicating features. There is room for improvement here. For example, perhaps citation-by-identifier could be a pandoc filter as opposed to a separate workflow that must search through markdown (see https://github.com/manubot/manubot/pull/99). Or perhaps we can use Pandoc templates for our custom HTML.
Is there any work done by Pandoc currently that needs to be undone before further processing by Manubot?
Not really.
This is a followup to some of the discussion in #169, viz. https://github.com/manubot/rootstock/pull/169#issuecomment-472192982.
I see two possible directions for detailing the customization options.
Is it useful (and feasible) to describe the common customizations one might want for writing a typical journal or grant? The motivation is: if I use Manubot to write a grant, I don't want the readers thinking "why does this document look different to the others" or worse "this document doesn't conform to our formatting specifications and we won't consider it". Dealing with formatting is (unfortunately) still a real problem. Personally, I can work in Word and I can usually figure out how to mess with formatting to conform to requirements (and I can leave this to the last minute), but if I write with Manubot, how much time should I allocate for tweaking the formatting? A few hours? A few days so I can post Issues and wait for feedback? I think it's murky. Is our recommendation, on behalf of Manubot, to tell users with formatting requirements to export to DOCX and do the formatting there? Or do we want to detail how to customize the CSS? Say I want to submit an article to Nature, I need to make sure I don't exceed 5 pages, try to use Times New Roman, 12 pt, with Greek letters for math, put tables on a separate page with the description double spaced, figures as small as possible, with amino acid sequences in Courier, etc... Do we want to show how to write a multi-font document, resize figures inline, put in manual page breaks, write distinct
div
s for different body elements? Do we even support page numbers (I don't think most print PDF from HTML formatters do)? I'm not sure what the guidance should be...How easy is it get rid of some of the plugins if a user wishes? This is comparatively easy. In
USAGE.md
, we write that each plugin is enabled duringbuild.sh
and can be individually turned off.