jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.95k stars 3.35k forks source link

Add a style for tables in docx writer #3275

Closed choies1 closed 7 years ago

choies1 commented 7 years ago

When I convert HTML table to docx using pandoc (ver 1.18), I would like to change the table style for MS-word(docx).

I used the following Pandoc command for conversion (HTML table to docs)

pandoc -f html -t docx -S --mathjax conversion_test.html -o conversion_test.doc

HTML table code

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Account</th>
      <th>Name</th>
      <th>Rep</th>
      <th>Manager</th>
      <th>Product</th>
      <th>Quantity</th>
      <th>Price</th>
      <th>Status</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>714466</td>
      <td>Trantow-Barrows</td>
      <td>Craig Booker</td>
      <td>Debra Henley</td>
      <td>CPU</td>
      <td>1</td>
      <td>30000</td>
      <td>presented</td>
    </tr>
    <tr>
      <th>1</th>
      <td>714466</td>
      <td>Trantow-Barrows</td>
      <td>Craig Booker</td>
      <td>Debra Henley</td>
      <td>Software</td>
      <td>1</td>
      <td>10000</td>
      <td>presented</td>
    </tr>
    <tr>
      <th>2</th>
      <td>714466</td>
      <td>Trantow-Barrows</td>
      <td>Craig Booker</td>
      <td>Debra Henley</td>
      <td>Maintenance</td>
      <td>2</td>
      <td>5000</td>
      <td>pending</td>
    </tr>
  </tbody>
</table>

Web browser result:

html_output2

MS Word result (after conversion):

docx_output

Howe can I change MS-word(docx) table styles as follows when I convert HTML table to docx using pandoc ?

docx_output2

jgm commented 7 years ago

Here are the official instructions. Create a reference.docx:

pandoc --print-default-data-file reference.docx > myref.docx

Open this with Word. Change the "Normal Table" (or "Table Normal") style to match what you want. Save. Then invoke pandoc with

pandoc --reference.docx myref.docx yourinput.html -o youroutput.docx

PROBLEM! This doesn't work, because you can't modify "Normal Table" (at least I couldn't).

So I think we need some changes in pandoc to make this possible. This site https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_other/i-nead-to-edit-the-format-normal-table-how-do-i-do/ab27db4e-a7c3-432f-8d3a-6c5f274f2d6a recommends creating a custom table style, and we could do that, and use it for pandoc tables.

@jkr any thoughts?

mdolginin commented 7 years ago

You can automatically change default headers, text, lists styles only. You can't change the default table style. The only way is to create custom table style in your reference file and manually apply it to each table in document after conversion.

jkr commented 7 years ago

@jgm -- from what's posted it looks like the best way. I'm not too familiar, but I'll take a look -- probably tomorrow.

asknet commented 7 years ago

+1 Having a custom table style would be very useful. Thanks!

davebraze commented 7 years ago

+1 I'd like this a lot

asknet commented 7 years ago

Hi, I took nightly build(hash-8300b3fb). First generated pandoc --print-default-data-file reference.docx > myref.docx. Opened myref.docx, but couldn't find table style of name "Table" hence i'm unable to modify the table style. Please assist.

Thanks

mmacphail commented 7 years ago

Hi,

I'm experiencing the same problem as asknet. Help would be appreciated !

Thanks

mogita commented 7 years ago

Same issue here. I have the latest version of pandoc installed (1.19.2.1).

I tried the method refenreced in the official docs (the pandoc --print-default-data-file reference.docx > custom-reference.docx way), edited it and made it the new reference file, but didn't help with the result. Tables were still borderless.

Then I tried editing the output docx file, like what the author of this post said in the Styling docx output section. I applied borders to "All" sides of the table, looks good, then used the edited file as the reference, still, no borders in the result.

I think maybe the official docs could be more clearified about this certain issue, as it takes too much for a user who isn't familiar with the specs of the docx format to make the "right" change, still without significant improvement to this tiny yet critical spot (since all other parts of the convertion were just perfect). Or there could be some args that defines some custom table style I'd like to use in the reference file, as the Normal Table isn't allowed to be modified.

Thanks!

jgm commented 7 years ago

@mogita the fix is not in 1.19.2.1, it's only in the dev version.

Jasonlhy commented 7 years ago

this feature can be quite useful

ryu-jin commented 7 years ago

@jgm do you know when this fix will be integrated in the release by any chance? Cheers.

russellsch commented 6 years ago

I would be very interested in this feature as well

mb21 commented 6 years ago

This should be released as of now, in pandoc 2.0

yffs89 commented 6 years ago

I also have the same problem.There is no border in the table in the docx file I generated.Please help me.

agusmba commented 6 years ago

@yang-small-fan did you edit the table style in your reference docx to fit your needs?

yffs89 commented 6 years ago

@agusmba I edit the form style in the reference template, but it doesn't seem to work.It still has no border.

iansco commented 6 years ago

Hey @yang-small-fan - it's my understanding that Pandoc's docx writer creates tables with a Style named "Table", so you need to make sure that you apply styling changes in your "reference-doc" to this particular Table Style.

According to the comment above, I believe you also need to be running Pandoc 2.0 or later, as well...

yffs89 commented 6 years ago

@iandol According to the official website prompt, the custom-reference.docx form I generated seems to have some problems, but I can't modify it. My pandoc version is 2.1.1, and office version is 2016.. custom-reference.docx

iansco commented 6 years ago

@yang-small-fan - I just opened the reference doc you attached to your comment, edited the "Table" style (setting "All Borders"), saved the reference doc, then tested the amended reference doc by passing a simple Markdown file through Pandoc. It worked fine for me, so I'm not sure where things aren't quite working out for you.

Here's modified copy of your reference doc, a source markdown file (for testing purposes), and the docx output file in case they help you:

pandoc table.txt -f markdown -t docx --reference-doc yang-small-fan-custom-reference.docx -o table.docx

I'm on Mac OS, so the process I went through to change the Table Style in MS Word was as follows:

  1. Open the reference doc
  2. Select (or click anywhere inside) the Table
  3. Click the "Table Design" tab
  4. Click the little down-arrow icon on the Quick Styles list (which annoyingly only appears when you hover your mouse over the styles)
  5. Click "Modify Table Style" on the popup menu
  6. Make style changes
  7. Save the reference doc

Here are some screenshots in case they help:

screenshot_1140

screenshot_1139

yffs89 commented 6 years ago

@iansco I used your docx template to generate the correct form, but I also like you to add a border form to my template. At the same time, I don't know why my template didn't work after conversion. I guess it's the MS Word version or the reason of the operation system. In a word, the problem has been solved. I'm very grateful to you. (๑′ᴗ‵๑)I Lᵒᵛᵉᵧₒᵤ❤

agusmba commented 6 years ago

@yang-small-fan bear in mind that you need to edit the Table Style called "Table" as @iansco explained beautifully. It won't work if you only edit a table with that style in your reference docx. It's all about styles.

iansco commented 6 years ago

Thanks @yang-small-fan :-) When using Pandoc to create docx output I've found the Options affecting specific writers section in the Pandoc manual to be helpful - specifically, the section titled Docx under --reference-doc:

screenshot_1146

This section provides a list of all the Styles which Pandoc uses. You'll see that Pandoc only currently uses a single Table Style named Table:

screenshot_1148

So - as per the comment immediately above (thanks @agusmba!) - if you want to change the styling of any Tables in your docx output, you'll need to make sure you edit the Table Style named Table in your reference-doc:

screenshot_1139

There are other ways to affect the style of docx output (see Custom Styles in Docx Output and filters), but that's a discussion for another day :-)

Good luck!

P.S. if you find you're still having problems applying styling changes to your reference-doc, please post a comment which explains, in as much detail as you can, exactly how you're making the changes (screenshots can often help, of course), and I'm sure someone will be able to help...

yffs89 commented 6 years ago

Hey @iandol I now need to convert the HTML file into a word file. Is there a way to keep the style in HTML?

agusmba commented 6 years ago

@yang-small-fan AFAIK styles from html do not go through pandoc to docx directly, but simple copy-paste might get you halfway there (from browser into word).

Once you have your desired styles in a word document, you can use that one as reference-doc for any additional pandoc conversions.

dmenne commented 5 years ago

@iansco Your methods almost work for my, thanks a lot! The "almost": Sorry for the German version, "Verbundene Zeilen" is a stupid translation of "Banded" in English.

This is what I requested in the template, no decorations for last row, and first/last column:

screen_000047

This is created by knitr/Pandoc. Too bad, "Last Row" and Last/First Column are selected.

screen_000048

Everything is fine when I manually correct the Options setting for each table. This could be done by a Macro, but does anyone know a method to set these options in the Template?

In case no other solution is found: this is my Macro

Sub TableDefaultOptions()
  Dim tbl As Table
  For Each tbl In ActiveDocument.Tables
    tbl.ApplyStyleFirstColumn = False
    tbl.ApplyStyleLastColumn = False
    tbl.ApplyStyleLastRow = False
    tbl.ApplyStyleRowBands = True
  Next tbl
End Sub
fiskeren commented 5 years ago

I have the same problem as @dmenne. Even though "Total Row" and "Last Column" are unchecked in my reference document, the columns still appear in my generated docx document.

yasz commented 5 years ago

PLEASE update this issue

gatestone commented 4 years ago

I edit in Markdown, and plan to convert to Word. The original instructions above work, except it took some googling to understand how to find and edit "Table" style in Office 365/Word, see below. I also got a warning about old format of myref.docx after editing and saving but ignored the warning it and everything works.

https://support.office.com/en-us/article/format-a-table-e6e77bc6-1f4e-467e-b818-2e2acc488006#bm1

joezhouchenye commented 4 years ago

I figured out how to change the table format in reference.docx at last.

ZhouJunjun commented 2 years ago

add filter and custom your own Table style, see lua filter: https://github.com/ZhouJunjun/TyporaLuaFilter

zkaip commented 1 year ago

Can I Change default "Table" style name to my custom style name? Because I have some other Table here, I want some table owned its custom name.

joezhouchenye commented 1 year ago

Changing the default Table style isn't enough previously.

Recently, I created a Word Macro to cope with this issue.

Sub TableStyleFix()
  Dim atable As Table
  For Each atable In ActiveDocument.Tables
    If atable.Columns.Count > 2 Then
      atable.Style = ("Custom Table")
    End If
  Next
End Sub
Sub TableAutoFitWindow()
  Dim atable As Table
  For Each atable In ActiveDocument.Tables
    If atable.Columns.Count > 2 Then
      atable.AutoFitBehavior (wdAutoFitWindow)
    End If
  Next
End Sub
Sub TableDistributeColumns()
  Dim atable As Table
  For Each atable In ActiveDocument.Tables
    If atable.Columns.Count > 2 Then
      atable.Columns.DistributeWidth
    End If
  Next
End Sub
Sub TableAutoAdjust()
    Application.Run MacroName:="TableStyleFix"
    Application.Run MacroName:="TableAutoFitWindow"
    Application.Run MacroName:="TableDistributeColumns"
End Sub

You can create a custom table style named "Custom Table" or anything you like.

Just Run the TableAutoAdjust Macro. Using auto-fit-window and distribute-columns will be a good choice, although not suitable for small tables.

I use this for tables with more than 2 columns because if you use pandoc-crossref, subfigures will mostly be rendered as a two-column table.

The reason I created separate macros is that an all-in-one macro only works in debug mode and you have to run it step by step. Not sure why, but using Application.Run works.

sercutos commented 4 months ago

@yang-small-fan - I just opened the reference doc you attached to your comment, edited the "Table" style (setting "All Borders"), saved the reference doc, then tested the amended reference doc by passing a simple Markdown file through Pandoc. It worked fine for me, so I'm not sure where things aren't quite working out for you.

Here's modified copy of your reference doc, a source markdown file (for testing purposes), and the docx output file in case they help you:

pandoc table.txt -f markdown -t docx --reference-doc yang-small-fan-custom-reference.docx -o table.docx

I'm on Mac OS, so the process I went through to change the Table Style in MS Word was as follows:

  1. Open the reference doc
  2. Select (or click anywhere inside) the Table
  3. Click the "Table Design" tab
  4. Click the little down-arrow icon on the Quick Styles list (which annoyingly only appears when you hover your mouse over the styles)
  5. Click "Modify Table Style" on the popup menu
  6. Make style changes
  7. Save the reference doc

Here are some screenshots in case they help:

screenshot_1140

screenshot_1139

thanks you so much!