jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.02k stars 3.35k forks source link

HTML reader: width on style attribute of table does not influence column widths #7617

Open twiggler opened 2 years ago

twiggler commented 2 years ago

Explain the problem. command: pandoc --output --to docx reference-doc=custom-reference.docx report-8660595912627799385.html

Pandoc version? pandoc.exe 2.14.2 windows

bugreport.zip

When table has inline style width: 100% as in the second table I expect the docx table to have full width, i.e. image I think the key is <w:tblW w:w="5000" w:type="pct" />

However, currently the resulting layout autofits the contents of the table cells, and is not full width.

mb21 commented 2 years ago

does it also happen without reference-doc=custom-reference.docx ?

jgm commented 2 years ago

This has nothing to do with docx specifically. You can reproduce using -f html -t native, giving the following representation of the table:

, Table
    ( ""
    , []
    , [ ( "style"
        , "border-collapse: collapse; width: 99.5575%; height: 22px;"
        )
      , ( "border" , "1" )
      ]
    )
    (Caption Nothing [])
    [ ( AlignDefault , ColWidthDefault )
    , ( AlignDefault , ColWidthDefault )
    ]
    (TableHead ( "" , [] , [] ) [])
    [ TableBody
        ( "" , [] , [] )
        (RowHeadColumns 0)
        []
        [ Row
            ( "" , [] , [ ( "style" , "height: 22.3984px;" ) ] )
            [ Cell
                ( ""
                , []
                , [ ( "style" , "width: 50.0443%; height: 22.3984px" ) ]
                )
                AlignDefault
                (RowSpan 1)
                (ColSpan 1)
                [ Plain [ Str "Aa" ] ]
            , Cell
                ( ""
                , []
                , [ ( "style" , "width: 50.0443%; height: 22.3984px" ) ]
                )
                AlignDefault
                (RowSpan 1)
                (ColSpan 1)
                [ Plain [ Str "Bbb" ] ]
            ]
        , Row
            ( "" , [] , [ ( "style" , "height: 22.3984px;" ) ] )
            [ Cell
                ( ""
                , []
                , [ ( "style" , "width: 50.0443%; height: 22.3984px" ) ]
                )
                AlignDefault
                (RowSpan 1)
                (ColSpan 1)
                []
            , Cell
                ( ""
                , []
                , [ ( "style" , "width: 50.0443%; height: 22.3984px" ) ]
                )
                AlignDefault
                (RowSpan 1)
                (ColSpan 1)
                []
            ]
        ]
    ]
    (TableFoot ( "" , [] , [] ) [])
]

As you can see, the style information is included in an attribute. But it isn't taken account of in determining the column widths, since we have:

    [ ( AlignDefault , ColWidthDefault )
    , ( AlignDefault , ColWidthDefault )

It would be possible to modify the HTML reader to look at the width component of the style attribute and use this in computing widths. In this case, though, we'd have to set both columns to 0.5 full width, since there isn't other information here about the relative widths to use.

elearning202 commented 2 years ago

Is this issue planned to be resolved in the near future? and is there a fix or trick to specify the width of table cells?

jgm commented 2 years ago

You can use <col> elements with relative widths.

jblachly commented 3 months ago

You can use <col> elements with relative widths.

This is not useful because the table is still <w:tblW w:type="auto" w:w="0" /> and has ColWidthDefault

Consider the following:

<html
    <body>
        <table>
            <tr>
                <td width="20%">col 1A</td><td width="80%">col 2A</td>
            </tr>
            <tr>
                <td width="20%">Col 1B</td><td width="80%">col 2b</td>
            </tr>
        </table>
    </body>
</html>

While the pandoc AST propagates the 20% and 80% as cell properties, the Table still has ColWidthDefault, and the docx does not contain any emitted width other than auto, and the table is compressed.

NB: adding property width="100%" on the <table> element does not fix this either

jgm commented 3 months ago

As noted above: use col elements with relative widths. Then pandoc will set non-default column widths. e.g.

<colgroup>
<col style="width: 20%" />
<col style="width: 80%" />
</colgroup>

Setting width attributes on individual cells does not affect overall column widths.