Open xtimbeau opened 11 months ago
Thanks for the report!
What's the version of gt
you used here?
Also don't mixed syntax to pass options. Quarto heavily recommends to use YAML style options.o
---
title: "gt table"
format: docx
---
```{r}
#| label: tbl-table1
#| echo: false
#| message: false
#| warning: false
gt::gt(tibble::tribble(
~a, ~b, ~c,
"text", 1, "Long text aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaargh",
"others", 2, "Long text aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaargh"
))
Thanks,
gt version is github dev version (0.9.0.9000) but I tried with gt 0.9 with same results.
OK for the unmixed syntax. As a matter of fact I first tried to mix syntax to see if ther problem persisted (including putting the label in the header {}
.
The issue not only the table, the whole thing produced bad Word document at least using the latest development version
sure but if you click yes then you get a word doc with a bad table.
Le ven. 6 oct. 2023 à 19:55, Mickaël Canouil @.***> a écrit :
The issue not only the table, the whole thing produced bad Word document. [image: image] https://user-images.githubusercontent.com/8896044/273294518-d359364f-a2f2-40ae-b639-6022b97fd6ba.png
— Reply to this email directly, view it on GitHub https://github.com/quarto-dev/quarto-cli/issues/7151#issuecomment-1751191968, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANA2KEJ37K56NZDCSTWCX2DX6BA2DAVCNFSM6AAAAAA5VLW2D6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJRGE4TCOJWHA . You are receiving this because you authored the thread.Message ID: @.***>
Well, if the document is malformed, there is no reason to expect the content to be good.
Removing the label frm the code chunk brings back rendering to normal.
It seems to me the gt output is exactly the same. Only change between two render is the #tbl-table1
on the intermediate cell div. This triggers the crossref code path, and it seems something is not expected.
@cscheid I wonder if we have something similar to figures where new cross ref system expect a certain Markdown structure and knitr output does not give you what is expected (or just it is not accounted for by the new crossref).
Let me know if I need to adjust anything on R side.
I'm almost certain this is just a bug on the new crossref code. I'll handle it this week.
Getting the same issue with Flexible
---
title: "FlextableBug"
format: docx
---
## Quarto
```{r}
#| label: tbl-Biology
#| tbl-cap: A nice caption
#| echo: false
library(flextable)
ft <- flextable(head(mtcars))
ft
I'm experiencing the same bug with a recent quarto daily version (was hoping everything was fixed after closing of other issues I saw). In my experience, both the gt and flextable tables contents are fine, but something is messed with the formatting that Word is expecting, so it "fixes" it.
For example, here is a table that is "broken" by Word fixes, from including a label and caption:
And here is what it looks like without a label and caption:
I have a repo with testing both flextable and gt here.
@rmflight I believe we are tracking that here
Something we do is creating a bad open xml content that triggers Words to "fix" the document. We are still trying to identify.
I've been looking at generating tables for docx from Julia, and also noticed that my docx files became invalid when I added table captions to the generating cells. I looked in the generated xml and found that captions were implemented by nesting the content inside a table. This means my docx table ended up as the only content of a table cell object in the xml code. This led me to the following stackoverflow post https://stackoverflow.com/questions/4485225/openxml-nested-tables And according to that, a nested table object needs to be followed by an empty paragraph. Once I added that, Word opened the docx file correctly. The remaining issue was that somehow the inner table is very narrow and long, which might be caused by some style settings on the outer caption table.
@jkrumbiegel Please provide reproducible example of your case. Also your case might not be the same since you talked about tables generated with Julia.
Could you share a small self-contained "working" (reproducible) example to work with, i.e., a complete Quarto document or a Git repository? Thanks.
You can share a Quarto document using the following syntax, i.e., using more backticks than you have in your document (usually four ````
).
````qmd
---
title: "Reproducible Quarto Document"
format: html
engine: knitr
---
This is a reproducible Quarto document using `format: html`.
It is written in Markdown and contains embedded R code.
When you run the code, it will produce a plot.
```{r}
plot(cars)
The end.
I was just trying to help the bug finding process in this thread, it could be that tables rendered to docx are invalid with captions because of the empty paragraph issue I mentioned above.
Sharing a reproducible example of your case can help to track down the exact root cause if it a cross engine issue.
PS: I am not sure a 13 years old thread is really up to date with Microsoft Word internal XML specification.
Sorry, I don't think you understand what I was trying to do here. I came across this issue trying to find information on a related problem, and I just solved a very similar issue to the one described in this thread for myself (making tables spliced into docx files work with quarto table captions). That I did the generation of the table with Julia is kind of irrelevant. The missing empty paragraph in the openxml markup was the reason that Microsoft Word complained. So if that info helps you here, that's cool, if not, then that's ok as well.
I see. Indeed I did not understood that you did solve an issue and simply reporting how. Thanks for sharing!
Is there any update on this? Right now I need to upload the docx to Google Doc to be able to read it properly (the tables are well formatted).
Thank you for your interest in the issue.
There is no need to ask for update. Updates are provided when there are ones to provide, this can be comments or the issue being closed, as you can see neither of those things happened.
Thank you for your interest in the issue.
There is no need to ask for update.
Updates are provided when there are ones to provide, this can be comments or the issue being closed, as you can see neither of those things happened.
Okay thanks. Based also on how you reply to many other users, I see you keep being fairly passive aggressive.
I see you keep being fairly passive aggressive.
I apologise, there is no intent from me there (my intent was simply to be factual nothing more, nothing less), i.e., I am not a native English speaker, so it seems the tone is not correct unfortunately ...
I see you keep being fairly passive aggressive.
That's not an acceptable comment. Please refer to our code of conduct https://github.com/quarto-dev/quarto-cli?tab=coc-ov-file#readme
To be specific, the comment you made is a personal attack: the comment refers to the person rather than a specific action.
I will follow Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience, but many times there was a luck of Demonstrating empathy and kindness toward other people.
This bug still exists. I'm going to try and give a simple reproducible example below. I'm sure I'll do something wrong and get the usual wrist-slap - but I'll try anyway.
---
title: "Trying to Make a Nice Table Using gt and MS Word"
format:
docx:
toc: false
number-sections: true
number-depth: 1
highlight-style: github
reference-doc: word-template.docx
fig-dpi: 600
---
{{< pagebreak >}}
# Background
Make a table. Try to reference it (@tbl-test). Then print it out
```{r}
#| label: tbl-test
library(tidyverse)
library(gt)
tbldat <-
mtcars %>%
select(c(cyl, hp, mpg)) %>%
rownames_to_column() |>
rename(Car = rowname) |>
slice_sample(n=10)
tbl <-
tbldat %>%
gt() %>%
tab_header(title = "Propensity Model Performance") %>%
tab_style(
style = cell_text(align = "left"),
locations = cells_title()
) %>%
cols_label(
cyl = "Cylinder",
hp = "Horsepower",
mpg = "*Miles* per Gallon",
.fn = md
) %>%
cols_align(
align = "center",
columns = c(cyl, hp)
) %>%
tab_style(
style = cell_text(weight = 625),
locations = cells_column_labels()
) %>%
tab_style(
style = cell_text(style = "italic"),
locations = cells_body(columns = Car)
) %>%
tab_source_note(source_note = md("**AUC** - Area under Receiver Operating Characteristic Curve, **PPV** - Positive Predictive Value")) %>%
opt_stylize(3) |>
tab_options(
table_body.hlines.style = "solid",
table_body.hlines.width = 1,
table_body.hlines.color = "#cccccc"
)
tbl
The file renders, but when you try to open the document you get the following:
"Word found unreadable content in test.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes."
If you click "Yes" (I trust myself so I did that) - then you get the following message:
"This document contains fields that may refer to other files. Do you want to update the fields in this document?"
I do! I do! So I click "Yes" again. This is getting exciting.
Now I get a popup message from word with the title "Show Repairs". It says "Errors were detected in this file, but Word was able to open the file by making the repairs listed below. Save the file to make the repairs permanent."
Oh, thank you word, for making those repairs. But alas, upon looking at the document. The repairs were 💩.
The table looks terrible. Unformatted. Squeezed. Etc...
UPDATE:
removing the chunk label lets word create a table, but it is not properly formatted and cannot be referenced in the text - so, that's not good.
Note that it should look like this:
No wrist slap, but rather a general reminder that if the issue is open, that means that we are aware that the bug is still happening.
No wrist slap, but rather a general reminder that if the issue is open, that means that we are aware that the bug is still happening.
Apologies. I now see that I should have realized this was still an open issue.
I confess to getting confused in some of the thread's back-and-forth, citing other issues, and so on. Also, it wasn't clear to me which bug we are actually talking about - there seem to be several. Removing the label gets rid of the weird MS Word message, but in neither case do you get a properly formatted table. I guess I'll look for that bug somewhere as well.
In any case, I appreciate the responsiveness and the package. There's no doubt that gt is fantastic in html, where I use it all the time. But it really doesn't seem to work well at all in word. Hopefully soon!!!
Removing the label gets rid of the weird MS Word message
By removing the label, it means that cross referencing processing won't apply on this table an non necessary, and this is the processing that creates some malformed openxml somewhere. So you don't get the message because no processing happens, but also cross referencing is not possible
but in neither case do you get a properly formatted table.
in your post (https://github.com/quarto-dev/quarto-cli/issues/7151#issuecomment-2007979180) Update part, you shared "Note that it should look like this" - how did you determined it should look like this ? Is this what you expect from your word-template.docx
? This could be another issue with Quarto, but it could also be related to reference doc configuration. You should try with bare pandoc to see if a markdown table is rendered through your expecting table. it is also possible that gt is outputting raw output for docx and that you won't get the style for your reference doc, but the style defined by gt.
Anyhow, for this one, if you think this is a bug - please to do open a new issue. Thank you !
@cderv, regarding my comment
Note that it should look like this
I was basing this on what the table looks like in html - which I guess is the "native" format.
Sorry if I don't use the exact right words (e.g. "native", "should look like"). I am just a simple researcher, not a computer scientist/software developer. I have been scolded several times for mentioning this apparently irrelevant fact - but I am trying to explain why I may appear confused at times, and pleading for patience and guidance.
For example, you wrote:
You should try with bare pandoc to see if a markdown table is rendered through your expecting table. it is also possible that gt is outputting raw output for docx and that you won't get the style for your reference doc, but the style defined by gt.
I don't actually know what a "bare pandoc" is. I am trying to follow the instructions in quarto for creating a docx (i.e., using a word template) and gt (i.e. for making a table look pretty). Maybe the two don't go together?
In any case, as the reprex shows, the table you get in MS Word looks bad (i.e., all formatting is lost). This has been my universal experience with gt --> docx. Basically, it appears to me that output to MS Word simply doesn't work - you get a very basic unformatted table (at best).
If you believe the issue of all formatting being lost is a "new" bug, I am happy to create another thread.
FYI @schwa021 There is no "native" format or more accurately, the native format is "native" which is the AST representation of a document, which is agnostic to the actual output format (you can try for yourself setting format: native
to see what it is).
Even if Quarto team is trying to get visually similar output across format, you should not expect that as LaTeX/PDF, Typst/PDF, Docs, HTML, etc are very different technologies/markups.
If you believe the issue of all formatting being lost is a "new" bug, I am happy to create another thread.
Remove all custom options such as reference-doc: word-template.docx
which might not be correct, i.e., use only format: docx
.
If you replicate the issue, then open a new issue with a small reprex and without all custom stuff.
@schwa021 Don't feel sorry - I am just asking for clarification. Your answers are perfectly fine to me.I don't expect you to talk like a computer software developer. On the contrary, I am trying to understand your feedback as a simple user. So this is all good ! It is me who is using term not adequate in this conversion (I should have expected you to understand "bare pandoc" - let's forget that)
There are a lot of tools in the stacks (quarto, markdown, pandoc, R, gt, docx) so it hard sometimes to follow everything. I believe the confusion may come from how the tools are working.
Let me try to clarify
This has been my universal experience with gt --> docx. Basically, it appears to me that output to MS Word simply doesn't work - you get a very basic unformatted table (at best).
I don't think gt will by default give you same output styling in HTML and in docx. So you see a difference in Quarto output because of that. You will get the same difference in R Markdown when using gt.
So this difference and confusion in styling is from gt and quarto can't do anything really as gt is providing raw output (HTML code, or openxml code).
If you want exactly the same table as in HTML for docx output I think you will need to export as image and use that exported image in the docx. But this won't be a markdown table.
Otherwise, if docx is your primary format, you may need to consider other table package like flextable which could have more styling option you are looking for.
I believe there are currently some related issue about this at https://github.com/rstudio/gt/issues?q=is%3Aopen+label%3A%22Focus%3A+Word+Output%22+sort%3Aupdated-desc like
I hope this help understand. Sorry if I am still using too technical term. They should not be needed for such discussion.
Thanks a lot for your feedback as simple user BTW - we need those too !
@cderv, Thank you for the quick and clear reply. I think you may have explained a major misunderstanding I had.
When I read the gt documentation, and looked at the examples, there was a LOT of emphasis on formatting to create beautiful tables. In fact, when doing this for output to html - it is terrific. I love the tables that are produced.
Then I read that gt "supported" output to docx. I understood this to mean it would be like output to html (i.e., it would look nice). Apparently, my understanding was wrong. As you wrote below:
I don't think gt will by default give you same output styling in HTML and in docx. So you see a difference in Quarto output because of that. You will get the same difference in R Markdown when using gt.
So this difference and confusion in styling is from gt and quarto can't do anything really as gt is providing raw output (HTML code, or openxml code).
If you want exactly the same table as in HTML for docx output I think you will need to export as image and use that exported image in the docx. But this won't be a markdown table.
Otherwise, if docx is your primary format, you may need to consider other table package like flextable which could have more styling option you are looking for.
My new understanding is that I simply cannot make a nice looking docx table using gt without saving as an image. The issue with that is that many (most) scientific journals will not accept that. BTW - I would much rather be working in Latex/pdf, but I work in the medical field where, sadly, docx has 99.99% market penetration.
Thanks a lot for your feedback as simple user BTW - we need those too !
My general feedback is that the Posit products are amazing for me. I really appreciate the opportunity to use powerful and practical software like this for free. I have also found the help on these github pages to be fairly useful - though, as we have discussed here - sometimes I get lost in the technical terminology. But, that is a "me problem". I assume there is a different level/type of support for paying customers.
Perhaps the gt documentation could make it clear somewhere that you should not expect nice looking output in docx format.
@schwa021 To be completely honest, the tables I used to get with gt
and docx
, while not as nice as the HTML versions, were perfectly fine. And if I upload the docx
to Google Drive and open the document, they are also rather acceptable. And I'll say more, when opening the docx
with Libre Office in Ubuntu, they also look nice! I found this issue with Word just recently because my co-authors made me notice that the tables were completely messed up.
So while I tend to agree that gt
and docx
might not work out perfectly, it used to work just fine.
@schwa021 To be completely honest, the tables I used to get with
gt
anddocx
, while not as nice as the HTML versions, were perfectly fine. And if I upload thedocx
to Google Drive and open the document, they are also rather acceptable. And I'll say more, when opening thedocx
with Libre Office in Ubuntu, they also look nice! I found this issue with Word just recently because my co-authors made me notice that the tables were completely messed up. So while I tend to agree thatgt
anddocx
might not work out perfectly, it used to work just fine.
I agree.
@schwa021 this is getting maybe a bit off topic for Quarto but I invite you to open a new issue or discussion in the gt repo.
We’ve been working on making the non-HTML formats capture and display more of the styles declared but it’s an ongoing process (though we’ve essentially caught up with LaTeX, so this is happening).
Sorry for crashing the discussion but after reading between the lines, I think it is worth recognizing that we as enthusiastic users are a bit like early adapters of the first (free) iPhone, nagging about that, despite the 100 new features, we wished there was just a better cable connector to a Windows PC, preferably yesterday because we started bragging about the new phone at work, haha. Kind of the pitfall of success. So thank you very much for your hard work guys (you too @rich-iannone).
I apologise, there is no intent from me there (my intent was simply to be factual nothing more, nothing less), i.e., I am not a native English speaker, so it seems the tone is not correct unfortunately ...
That made me reflect a lot. Easy to get caught up in a "nerdy debate". <3
@rich-iannone Is there anything that can be done in gt to address this. A gt table saved to docx with gtsave()
usually takes up the full width, while when outputting to Quarto, the table is squished.
Maybe we could check for recent quarto and tweak the openxml string as necessary?
@olivroy, they are doing their best handling many issues - updates will come when issue is resolved. These comments take time away from coding.
I do understand your dispair though, and I personally is only waiting for this one limitation to gt/quarto. I could have financially rewarded a solution, paid by our company. But money and pressure is rarely what's needed, only time.
@sda030 You misunderstood the intent here. @olivroy contributed a lot lately to gt
and is very likely asking how to help by suggesting an approach.
Oh, my bad, sorry.
Bug description
When rendering a qmd with a gt table and a table label, docx output is faulty, word send a message that there is a problem, and the table appears strangely formated in the docx. Removing the label frm the code chunk brings back rendering to normal.
Steps to reproduce
Faulty :