ropensci / ozunconf19

OzUnconf19
http://ozunconf19.ropensci.org/
21 stars 5 forks source link

🥺 coRporateR: Rmarkdown -> Word with crazy corporate style docs #18

Open dvanic opened 4 years ago

dvanic commented 4 years ago

For better or worse, a lot of corporate reporting happens in Word 😢. Current Rmd -> Word solutions include:

However, none of them allow you to use a "fancy" Word template, and provide a key-value mapping for Rmd style to Word style (in a separate text file), and (ideally) enable you to specify somehow in the Rmd the incorporation of weird pages like title pages and separators.

I'd love a coRporateR package that would enable this!

Maschette commented 4 years ago

I thought one of the Rmarkdown options is you can give it a word template file that it will base the formatting off when it knits to word?

https://bookdown.org/yihui/rmarkdown/word-document.html

dvanic commented 4 years ago

@Maschette Yes, you can - but that template file can still only include a handful of mappings of the markdown tags to Word styles (basically, mappings for H1/H2/H3 and body text and possibly quote), so table and special page formatting do not work.

When I last tried to use the approach you described with our internal Sydney Informatics Hub word "template" doc, I also found that Word would inevitably complain when I opened the generated file about how it was broken (or some iteration thereof, can't quote the error message as it was a few months ago, sorry), and while the formatting was kind of there, it really didn't do well with anything beyond the H1/H2/H3 (so no header images/footer stuff etc).

stephstammel commented 4 years ago

This is an utterly brilliant idea and would make life a lot easier!

njtierney commented 4 years ago

Great project! I'm not sure of the exact relationship between officedown and officer - https://github.com/davidgohel/officer - but might be worthwhile looking officer in addition to officedown?

kcf-jackson commented 4 years ago

@dvanic Interesting idea, and it sounds doable within the unconference time frame.

To facilitate the discussion, let's break the problem into two parts:

  1. Rmd -> Geneic key-value mappings
  2. Geneic key-value mappings -> MSWord

Please let me know if I misunderstood. Am I correct that what you are suggesting is mostly related to (2)?(which is to programatically render a MSword document with "fancy" layout)?

And do you still need the ability to render the Rmd into other formats like HTML and PDF (as they would not have the same layout)?


In any case, I could think of two ways to do (2).

  1. The first way is to make a direct call to the JavaScript library doctemplater, which takes a key-value JSON file and renders to the targeted location in the MSWord file. (See the first example "Replace a {placeholder} by a value") I have tested the library with this template, and it works well. Using this library can be convenient, as

    • it runs in both node and a browser,
    • the code needed is basically the example given in the documentation, and
    • the free-version of the library is MIT licensed.

    However, the main drawback is that this approach requires some JavaScript knowledge.

  2. The second way is to use a string substitution approach, which can be done entirely using R. This is not foolproof, but I'd expect it to work well in most situations.

    • The key is to recognise that docx is just a compressed file, and if you unzip it, then you'd get a folder of (mostly) XML files (which can be handled using the xml2 package or treated as a text file in R).
    • I think the document content is stored entirely in word/document.xml. So if in the document we have left a placeholder text, say {placeholder-1}, then we could look for this text and substitute with a string that we want.
    • After the change is made, we zip the folder, make sure it has the ".docx" extension, then it is done.

    I tested this manually on a single example, and it works fine. So this should be possible, I am just not sure how general this solution is. This approach is in a nutshell just unzip-read-modify-save-zip!

dvanic commented 4 years ago

@kcf-jackson I think I haven't explained it clearly enough, maybe....

I don't want to create a pre-made template in Word, where I fill in specific things from R.

I want to start with an Rmd, and be able to do Rmd -> Word (or PDF or HTML, the latter two of which each have their own style as per normal Rmd), where instead of # mapping to "Heading 1" it instead maps to "Custom heading 1 yada yada", ## maps to "Custom heading fancy 2" etc.

These fancy key - value pairs are stored in a config csv file (or a yaml).

H1 -> Custom heading 1 yada yada
H2 -> Custom heading fancy 2

This needs to work for:

There needs to be support for custom headers and footers, to be "taken" from the template word doc in addition to the styles. These don't have to be, necessarily, editable from within R, but should at least be rendered for editing in the Word.

I would also like to include special pages, which might have a small amount of metadata (which, worst case scenario, can be edited in Word):


Looking at how python does it... I can see that they've got:

I

kcf-jackson commented 4 years ago

@dvanic Thanks for the clarification.

I can follow the first part about mapping the Rmd "tags" to your chosen styles, instead of the default style. It seems solution 2 stills applies since it is just direct manipulation of the docx document. Since we know all the markdown syntax maps to some default styles, so one can do a tree substitute or string substitute directly in the word/document.xml to change them to the desired styles (e.g. "Heading 1" -> "custom heading 1", provided that the custom styles are well-defined).

I cannot follow the part where you said you want to include special pages. What Rmd syntax would that correspond to? Also, it would be helpful if you could provide an example of a docx (with the custom style) which you want the rendered Rmd looks like. That will help define the project scope better.

Thanks for listing the python references. I agree that python-docx and the demo are like officer, and I think python-docx-template is like using CSS class in an RMD file, which would probably render correctly only in one file format, but not in others.

dvanic commented 4 years ago

Update: this has been abandoned for the 2019 Unconf, but I will try to work on this in the future, probably as an addon to officer. If you'd like to help me with this please leave a comment in this issue.

kristyrobledo commented 4 years ago

@dvanic - I'm not familiar with officer, but I'm a great learner 😄 . And its something that I would be VERY interested in!

kimnewzealand commented 4 years ago

Yes keep me in the loop @dvanic

Maschette commented 4 years ago

On watching the Rstudio conf talk by Jake Thompson the ratlas package may be useful

dvanic commented 4 years ago

@Maschette Are the 2020 talks online already???? Pretty please share the link? (My google-fu is failing)

Maschette commented 4 years ago

You can get to them from the streaming bit, if you click play and then previous sessions you can go back through them https://rstudio.com/conference/grand-ballroom-a/

alapo commented 2 years ago

Definitely following this. I have been trying to do something similar because different academic journals have seperate submission templates. Did you ever find a solution?

robjhyndman commented 2 years ago

The rticles package is designed for academic journal submissions.