jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.88k stars 3.39k forks source link

Docx writer - hard coded table row border overrides table style #5460

Open kschach opened 5 years ago

kschach commented 5 years ago

Using Pandoc 2.7.2 to go from Markdown to Docx. I customized the table style in Word (and the OpenXML manually because of frustrations with Word) to include a 1pt border at the top of the table, at the bottom of the header row, and at the bottom of the table. However, the .docx writer overrides the bottom of the header row border. My Haskell knowledge is very limited, but I think the issue is in line 973-4 of Docx.hs, which adds a bottom border.

  let borderProps = mknode "w:tcPr" []
                    [ mknode "w:tcBorders" []
                      $ mknode "w:bottom" [("w:val","single")] ()
                    , mknode "w:vAlign" [("w:val","bottom")] () ]

The AST output for the table header begins with:

Table [Str "Affirmative",Space,Str "Activities"] [AlignLeft,AlignLeft] [0.5,0.5]
 [[Plain [Str "Activity"]]
 ,[Plain [Str "Intended",Space,Str "Outcomes",Space,Str "for",Space,Str "Service",Space,Str "Users"]]]

The Markdown table header is:

-------------------------------------------------------------------------------
Activity                                Intended Outcomes for Service Users
--------------------------------------- ---------------------------------------

And here is my OpenXML table style

<w:style w:type="table" w:customStyle="1" w:styleId="Table">
    <w:name w:val="Table"/>
    <w:basedOn w:val="TableNormal"/>
    <w:semiHidden/>
    <w:unhideWhenUsed/>
    <w:qFormat/>
    <w:rsid w:val="00D215D4"/>
    <w:tblPr>
        <w:tblBorders>
            <w:top w:val="single" w:sz="8" w:space="0" w:color="auto"/>
            <w:bottom w:val="single" w:sz="8" w:space="0" w:color="auto"/>
        </w:tblBorders>
    </w:tblPr>
    <w:tblStylePr w:type="firstRow">
        <w:tcPr>
            <w:tcBorders>
                <w:bottom w:val="single" w:sz="8" w:space="0" w:color="auto"/>
                <w:insideH w:val="single" w:sz="8" w:space="0" w:color="auto"/>
            </w:tcBorders>
        </w:tcPr>
    </w:tblStylePr>
</w:style>

If there's something I can do at the user end to correct this (short of editing in Word after) thoughts are very welcome. I looked to see if I could try editing the Haskell myself but it appears that I would need to compile afterwards to use it, which is beyond my skill set.

kschach commented 5 years ago

Here's what the header row looks like in the generated docx. I suspect if I remove the tcBorders tags it would resolve the issue.

<w:tr>
    <w:trPr>
        <w:cnfStyle w:firstRow="1" />
    </w:trPr>
    <w:tc>
        <w:tcPr>
            <w:tcBorders>
                <w:bottom w:val="single" />
            </w:tcBorders>
            <w:vAlign w:val="bottom" />
        </w:tcPr>
        <w:p>
            <w:pPr>
                <w:pStyle w:val="Compact" />
                <w:jc w:val="left" />
            </w:pPr>
            <w:r>
                <w:t xml:space="preserve">Activity</w:t>
            </w:r>
        </w:p>
    </w:tc>
    <w:tc>
        <w:tcPr>
            <w:tcBorders>
                <w:bottom w:val="single" />
            </w:tcBorders>
            <w:vAlign w:val="bottom" />
        </w:tcPr>
        <w:p>
            <w:pPr>
                <w:pStyle w:val="Compact" />
                <w:jc w:val="left" />
            </w:pPr>
            <w:r>
                <w:t xml:space="preserve">Intended Outcomes for Service Users</w:t>
            </w:r>
        </w:p>
    </w:tc>
</w:tr>
mb21 commented 5 years ago

I customized the table style in Word (and the OpenXML manually because of frustrations with Word)

You mean you created a reference docx? That would be the recommended way... see https://pandoc.org/MANUAL.html#option--reference-doc

kschach commented 5 years ago

I customized the table style in Word (and the OpenXML manually because of frustrations with Word)

You mean you created a reference docx? That would be the recommended way... see https://pandoc.org/MANUAL.html#option--reference-doc

Thanks for the reply, @mb21. I am using a reference.docx with a table style (the OpenXML of which from styles.xml in the .docx is pasted above). The table displays as expected in the reference.docx but not in the Pandoc generated .docx. I believe it's because the borderProps that are inserted by the docx writer overrule the table style. When I did a blame of the Docx writer yesterday, it looks like those lines date back to the original commit of the file in 2012: https://github.com/jgm/pandoc/commit/ba81cda7f18604379717f5052c0eaaa94c7d2067

I wonder if I am doing something wrong on my end, or if this hasn't been noticed because it's an admittedly very minor issue.

agusmba commented 5 years ago

@kschach I don't think you're doing anything wrong, I've run into this issue also (or maybe we both are doing something wrong).

agusmba commented 5 years ago

Might this be leftover code for when pandoc didn't use the Table table style? I guess the Table style in the default reference doc should come with the underline for the first row, but if the style is changed it should be respected by pandoc.

jgm commented 5 years ago

I agree, this shouldn't be hard-coded. Here is our current default style for tables (from styles.xml):

  <w:style w:type="table" w:default="1" w:styleId="Table">
    <w:name w:val="Table" />
    <w:basedOn w:val="TableNormal" />
    <w:semiHidden />
    <w:unhideWhenUsed />
    <w:qFormat />
    <w:tblPr>
      <w:tblInd w:w="0" w:type="dxa" />
      <w:tblCellMar>
        <w:top w:w="0" w:type="dxa" />
        <w:left w:w="108" w:type="dxa" />
        <w:bottom w:w="0" w:type="dxa" />
        <w:right w:w="108" w:type="dxa" />
      </w:tblCellMar>
    </w:tblPr>
  </w:style>

Can someone suggest a change to this so that tables will look the same, by default, even if the docx writer no longer generates

            <w:tcBorders>
                <w:bottom w:val="single" />
            </w:tcBorders>

Or should we simply change the default appearance so the bottom line is omitted?

EDIT: I think we do need the bottom line on header cells (that's where this element is being put). If there's a way to specify this in the style, I'm all for it.

agusmba commented 5 years ago

I think we do need the bottom line on header cells (that's where this element is being put). If there's a way to specify this in the style, I'm all for it.

I'll take a look since I've specified that in my Table style.

Ok, here's my current style, with a specific colored line below the header (this gets overwritten by pandoc with a black line):

<w:style w:type="table" w:customStyle="1" w:styleId="Table">
    <w:name w:val="Table"/>
    <w:basedOn w:val="Tablanormal"/>
    <w:uiPriority w:val="99"/>
    <w:rsid w:val="002073BE"/>
    <w:pPr>
        <w:spacing w:after="0"/>
    </w:pPr>
    <w:tblPr>
        <w:jc w:val="center"/>
        <w:tblBorders>
            <w:top w:val="single" w:sz="8" w:space="0" w:color="006699"/>
            <w:bottom w:val="single" w:sz="8" w:space="0" w:color="006699"/>
        </w:tblBorders>
    </w:tblPr>
    <w:trPr>
        <w:jc w:val="center"/>
    </w:trPr>
    <w:tblStylePr w:type="firstRow">
        <w:pPr>
            <w:keepNext/>
            <w:wordWrap/>
        </w:pPr>
        <w:rPr>
            <w:rFonts w:asciiTheme="majorHAnsi" w:hAnsiTheme="majorHAnsi"/>
            <w:b w:val="0"/>
            <w:color w:val="006699"/>
            <w:sz w:val="22"/>
        </w:rPr>
        <w:tblPr/>
        <w:trPr>
            <w:tblHeader/>
        </w:trPr>
        <w:tcPr>
            <w:tcBorders>
                <!-- This is the line below the first row -->
                <w:bottom w:val="single" w:sz="4" w:space="0" w:color="006699"/>
            </w:tcBorders>
            <w:shd w:val="clear" w:color="auto" w:fill="E8F2FE"/>
        </w:tcPr>
    </w:tblStylePr>
</w:style>

OTOH, I think that pandoc writes horizontal table lines whenever there is an horizontal line in the markdown table (maybe not for all formats, but at least in pipe-tables).

For instance:

| header 1 | header 2 |
| -------- | -------- |
| cell 1   | cell 2   |
| cell 3   | cell 4   |
| -------- | -------- |
| cell 5   | cell 6   |

will render a table with header and two extra horizontal lines (one below the header and one above the last row).

So ideally we'd need a way of reconciling both approaches (lines defined in the style with explicit lines in the source table). Should the first line (under header) be ignored as it is always explicit by design?

agusmba commented 5 years ago

Sorry I misunderstood your request, let me simplify the style as much as possible, while retaining the first line.

agusmba commented 5 years ago

After putting the default table style in, adding the first row bits and saving, word (2013) simplifies the style to:

<w:style w:type="table" w:customStyle="1" w:styleId="Table">
    <w:name w:val="Table"/>
    <w:basedOn w:val="Tablanormal"/>
    <w:uiPriority w:val="99"/>
    <w:semiHidden/>
    <w:unhideWhenUsed/>
    <w:rsid w:val="002073BE"/>
    <w:tblPr/>
    <w:tblStylePr w:type="firstRow">
        <w:tcPr>
            <w:tcBorders>
                <w:bottom w:val="single" w:sz="0" w:space="0" w:color="auto"/>
            </w:tcBorders>
        </w:tcPr>
    </w:tblStylePr>
</w:style>

and w:sz="0" w:space="0" w:color="auto" can probably be removed also.

jgm commented 5 years ago

OTOH, I think that pandoc writes horizontal table lines whenever there is an horizontal line in the markdown table (maybe not for all formats, but at least in pipe-tables).

This is not the case. In your example, the first row of ---s separates the header from the body. The black line here comes from the border pandoc inserts by default between header and body. The second row of ---s in your example just creates two cells containing three consecutive Em-dashes each. Of course, this may look like a line too, but it's the contents of the table cell, not a border.

jgm commented 5 years ago

Your style creates a border after the first row of the table. That's not quite what I was looking for. I want a border after the table header. Note that in pandoc you can have tables without headers, and for these we don't want a line after the first row, since that would make it appear as a header.

Is it possible in the style to create a line after the table header, but not, in general, after the first row?

agusmba commented 5 years ago

This is not the case. In your example, the first row of ---s separates the header from the body.

Ok, I may need to dig into my examples a bit because I could swear I saw something like this

Is it possible in the style to create a line after the table header, but not, in general, after the first row?

I think that's the way of doing it. The first row internally is the header. If you have a table without header it shouldn't sport that first row line. At least it doesn't in my headless tables.

kschach commented 5 years ago

I can take a closer look at the styles.xml later this week and draft something. I think we could take my tcBorders and replace it with what the Docx writer currently inserts as a node? Perhaps we can do the same with vAlign as well and move that into styles.xml for the conditional firstRow while we're cleaning up the Docx writer?

I likely won't be able to build locally but I could draft the style and submit a PR if that would be useful. I don't know what changes would need to be made to the Docx writer. Could we remove the borderProps variable and the subsequent reference to it if we move the formatting to the reference.docx?

agusmba commented 5 years ago

@jgm I made a quick test, with a brand new reference.docx, I edited the Table style to only have the firstRow bottom line (double) and tested it with:

$ cat pandoc-issue-5460.md
# Pandoc issue #5460

Testing a headless table:

|    |    |
|----|----|
| one| two|

And now a table with header:

| head1 | head2 |
|-------|-------|
| one | two |

image

So headless tables work. The issue is present in the second table. After re-applying the table style in word (*):

image

() <w:semiHidden/><w:unhideWhenUsed/> is giving me grief, I had to remove it at the xml level or I couldn't see* the style in word in order to re-apply it.

agusmba commented 5 years ago

It looks like the fix could be relatively easy: