jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.61k stars 3.38k forks source link

Grid table formatting with docx writer #2667

Open mickley opened 8 years ago

mickley commented 8 years ago

When outputting a grid table to a docx with a double-spaced template, the grid table ends up double-spaced and looks pretty terrible. This is not the case with other markdown table types, which remain single spaced.

Other types of tables have cells formatted in the Compact Word style. However, grid tables format their cells in the Normal Word style, which leads to the problem.

It appears that this is a result of how pandoc represents grid tables in AST. Cells are using Para instead of Plain (which other tables use), and Para gets converted to Normal by the docx writer.

Would it be possible to use Plain instead of Para for grid tables, or would that break things with markdown included in the table? If it's not possible, could Paras within tables be styled as something other than Normal?

There is a general lack of table styles in the docx writer/template to start with. It would be useful to have an overall table cell style independent of either Compact or Normal, and perhaps a table style analogous to FigureWithCaption. Both of these would allow for things like centering and borders that are currently not possible.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

For example, this table in markdown:

+---------------+---------------+--------------------+
| Fruit         | Price         | Advantages         |
+===============+===============+====================+
| Bananas       | $1.34         | - built-in wrapper |
|               |               | - bright color     |
+---------------+---------------+--------------------+
| Oranges       | $2.10         | - cures scurvy     |
|               |               | - tasty            |
+---------------+---------------+--------------------+

Leads to this pandoc AST:

[Table [Str "Grid",Space,Str "table"] [AlignDefault,AlignDefault,AlignDefault] [0.2222222222222222,0.2222222222222222,0.2916666666666667]

 [[Plain [Str "Fruit"]]

 ,[Plain [Str "Price"]]

 ,[Plain [Str "Advantages"]]]

 [[[Para [Str "Bananas"]]

  ,[Para [Str "$1.34"]]

  ,[BulletList

    [[Plain [Str "built-in",Space,Str "wrapper"]]

    ,[Plain [Str "bright",Space,Str "color"]]]]]

 ,[[Para [Str "Oranges"]]

  ,[Para [Str "$2.10"]]

  ,[BulletList

    [[Plain [Str "cures",Space,Str "scurvy"]]

    ,[Plain [Str "tasty"]]]]]];
mickley commented 8 years ago

Just pinging this topic again, especially in light of the new custom styles option in the docx writer as of 1.18.

Wrapping a table in a div applies the style (for example I can apply compact instead of normal). But only Paragraph styles are allowed.

Would it be possible to allow Table styles as well? I've tried using a Table style name (for example: "Colorful List") and Word actually recognizes that it's already used and gives me a Paragraph style named "Colorful List1".

Support for custom table styles would be really useful: you'd have complete control over borders, font, color, table header etc.

mickley commented 8 years ago

@jgm thoughts?

jgm commented 8 years ago

@jkr any thoughts on whether it would be feasible to apply table styles from the enclosing div to an enclosed table? I think you added this feature for paragraphs (though I may be misremembering).

jkr commented 8 years ago

Applying styles to tables is certainly doable, but I don't know what notation we'd use. I'm a bit wary about introducing another hard-coded key for divs (<div table-style=...). But that might be the only way.

Also, more generally, we could be a bit more clever about how we deal with grid tables, either at the reader level, or in the docx writer specifically. I think a lot of people might use grid tables because Emacs table mode makes it easy and attractive, and don't really care about the multi-block element. Two possible suggestions:

  1. If all cells contain a single Para, convert them all those Paras to Plains
  2. In all cells containing a single Para, convert it to a Plain, even if other cells have multiple blocks.

I think I prefer no. 2, and at the reader level. When would it ever be meaningful to have a single Para in a table cell instead of a Plain?

jkr commented 8 years ago

Just to clarify, I don't think it would make sense to use the <div custom-style= notation, because that div could contain both a table and a paragraph -- would it be a table style for one and a parstyle for the other? What about within the table?

mickley commented 8 years ago

@jkr Yea I agree though styling the paragraph with a parstyle and the table with a table style of the same name is at least workable. The ideal would be if it were an option for the table itself.

agusmba commented 7 years ago

I'm not sure if the original issue still applies (or it's just me not using a double spaced template), but the discussion also touched custom-styling for tables.

While we now can modify the default table style via the reference-doc, it would be a very nice thing to have, being able to apply different table styles in a document.

I hoped custom-style would work for custom table styles, but I understand it wouldn't make a lot of sense. Also I understand the reluctance to adding another hard-coded attribute. Unfortunately I don't have an alternative proposal.

mb21 commented 7 years ago

Hopefully we'll be able to put the table style directly in the attribute of the table element, once it's updated with https://github.com/jgm/pandoc/issues/1024

lizhuoqi commented 6 years ago

Just add a table style what every you want called "Table" in the reference-doc file。And update pandoc to latest.

agusmba commented 6 years ago

@lizhuoqi that works if you only need one table style in your word document. However sometimes you need a special table with a different formatting from the rest.

bpj commented 6 years ago

@jkr, coming here from #4697 I see no problem at all with using the same custom-style=Foo attribute for spans, paragraphs, lists and tables. If one doesn't want two different elements to have the same style name one should simply wrap each element in a different div. Those who produce HTML from the same source and are bothered by "excessive divs" can use a filter to strip divs with custom-style as only attribute. I think this will be pretty rare in practice since the semantics added by the custom style will be wanted in HTML also. One would usually add a class to the same div or use a CSS selector like [data-custom-style=Foo] table to style the table in HTML.

whoo commented 6 years ago

@jkr, coming here from #4697 I see no problem at all with using the same custom-style=Foo attribute for spans, paragraphs, lists and tables. If one doesn't want two different elements to have the same style name one should simply wrap each element in a different div. Those who produce HTML from the same source and are bothered by "excessive divs" can use a filter to strip divs with custom-style as only attribute. I think this will be pretty rare in practice since the semantics added by the custom style will be wanted in HTML also. One would usually add a class to the same div or use a CSS selector like [data-custom-style=Foo] table to style the table in HTML.

Hi @bpj, Could you please share a markdown sample to use it with a 'style' ? Tks

mb21 commented 6 years ago

@whoo see http://pandoc.org/MANUAL.html#custom-styles-in-docx

whoo commented 6 years ago

@whoo see http://pandoc.org/MANUAL.html#custom-styles-in-docx

Thanks | works great especially with [Table]