Closed mingsu closed 1 year ago
I think this is because huxtable outputs raw TeX or HTML, whereas kable outputs markdown. The solution is to output TeX citations (\cite{bibkey3}
) rather than markdown citations (@bibkey3
), and to use escape_contents()
. Can you confirm if this works for you?
---
title: "Test citation inside a huxtable table"
output:
bookdown::pdf_book:
latex_engine: xelatex
officedown::rdocx_document:
base_format: "bookdown::word_document2"
bookdown::html_document2:
link-citations: yes
bibliography: Ref.bib
---
# Make data
```{r makedata}
require(huxtable)
require(tidyverse)
require(knitr)
tabdf <- tibble(col1 = 1:6,
col3 = c("[ @bibkey1 ]",
"[ @bibkey2; @bibkey3 ]",
"@bibkey1",
"@bibkey2;@bibkey3",
"\\cite{bibkey3}",
"\\\\cite{bibkey3}"
))
tabdf %>% as_hux %>%
set_caption("Hux table") %>%
set_escape_contents(5:6, 2, F)
<img width="549" alt="image" src="https://user-images.githubusercontent.com/3145794/128596217-b4d4db99-769e-4e3b-a726-1cca3f9b9d8e.png">
You could try using bookdown "text references". - Example:
---
output:
bookdown::pdf_document2:
latex_engine: xelatex
html_document: default
bookdown::html_document2: default
references:
- id: eurostat
author: Eurostat
issued: 2021
title: Database - Employment and unemployment (LFS)
type: report
URL: 'https://ec.europa.eu/eurostat/web/lfs/data/database'
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(huxtable)
Some text blah blah [@eurostat, 88-99].
(ref:eurostat) Data: @eurostat [88-99]
tribble_hux(
~ Name, ~ Source,
"John Smith", "@eurostat",
"Jane Doe", "(ref:eurostat)",
) %>%
set_caption("A caption") %>%
add_footnote("(ref:eurostat)")
pdf output:
![Screenshot 2021-08-27 at 17 02 59](https://user-images.githubusercontent.com/2765811/131148392-e66d4c72-f335-4368-bc9c-9a6012c3bfe0.png)
Knitting with `bookdown::html_document2` works, too.
Interestingly enough, this does not work as expected if the output format is word (input file as above, with bookdown::word_document2: default
added):
Note that the "label" (ref:eurostat)
is (correctly) replaced by its corresponding "text" Data: @eurostat [88-99]
, but the "text" shows no signs of having been processed by pandoc (plus citeproc) before that. Other markdown tags such as **bold**
come through unprocessed as well (not shown in example).
It does work as expected in all output formats, including word, if using kable.
Any ideas on what huxtable does differently here? Should I open a new issue?
I have made several complex functions to make it work. Adjust them to your cases.
#' get output format
#'
#' @return
#' @importFrom knitr is_html_output is_latex_output pandoc_to
#' @export
#'
#'
get_outformat <- function(){
if (knitr::is_html_output()) {
return("html")
}
if (knitr::is_latex_output()) {
return("latex")
}
if (knitr::pandoc_to("docx")){
return("word")
}
return("markdown")
}
#' make valid huxtable for specified output format
#'
#' @param ht huxtable
#' @param id table id key
#' @param capstr table caption string
#' @param latexfloat latex float option
#' @param format output format
#'
#' @return
#' @importFrom knitr opts_current
#' @importFrom huxtable as_flextable set_latex_float set_caption set_label theme_article
#' @importFrom dplyr mutate_all
#' @importFrom flextable compose
#' @importFrom stringr str_replace_all
#' @export
#'
taber <- function(ht,
id = knitr::opts_current$get()$tab.id,
capstr = knitr::opts_current$get()$tab.cap,
latexfloat = "htbp",
format = get_outformat()) {
if(!("huxtable" %in% class(ht))) {
stop("`ht` should be huxtable!")
}
if (is.null(capstr)) { capstr <- "" }
if (is.null(id)) { id <- paste0("temp", abs(floor(rnorm(n = 1) * 10000))) }
if (format %in% c( "latex")) {
capstr <- capstr %>%
stringr::str_replace_all(., "%", "\\\\%") %>%
stringr::str_replace_all(., "~([^~]*)~", "\\\\textsubscript{\\1}") %>%
stringr::str_replace_all(., "\\*\\*([^\\*]*)\\*\\*", "\\\\textbf{\\1}") %>%
stringr::str_replace_all(., "\\*([^\\*]*)\\*", "\\\\textit{\\1}") %>%
stringr::str_replace_all(., "_([^_]*)_", "\\\\textit{\\1}") %>%
stringr::str_replace_all(., "\\^([^\\^]*)\\^", "\\\\textsuperscript{\\1}")
ht <- ht %>%
dplyr::mutate_all(~stringr::str_replace_all(., "%", "\\\\%")) %>%
dplyr::mutate_all(~stringr::str_replace_all(., "~([^~]*)~", "\\\\textsubscript{\\1}")) %>%
dplyr::mutate_all(~stringr::str_replace_all(., "\\*\\*([^\\*]*)\\*\\*", "\\\\textbf{\\1}")) %>%
dplyr::mutate_all(~stringr::str_replace_all(., "\\*([^\\*]*)\\*", "\\\\textit{\\1}")) %>%
dplyr::mutate_all(~stringr::str_replace_all(., "_([^_]*)_", "\\\\textit{\\1}")) %>%
dplyr::mutate_all(~stringr::str_replace_all(., "\\^([^\\^]*)\\^", "\\\\textsuperscript{\\1}")) %>%
huxtable::set_escape_contents(F) %>%
huxtable::set_latex_float(value = latexfloat) %>%
huxtable::set_caption(capstr) %>%
huxtable::set_label(paste0("tab:", id))
return(ht)
}
# for word
if (format %in% c("html", "word", "pandoc")) {
ft <- ht %>%
huxtable::as_flextable()
for (irow in 1:nrow(ht)) {
for (icol in 1:ncol(ht)) {
ft <- ft %>%
flextable::compose(part = "body", i = irow, j = icol,
value = mder(as.character(ht[irow, icol]), format = "word"))
}
}
return(ft)
}
ht <- ht %>%
huxtable::set_caption(capstr) %>%
huxtable::set_label(id)
return(ht)
}
#' turn string into specified format
#'
#' @param mdstr markdown syntax string
#' @param modestr mode to use, M: markdwon, G: gridtext; W: word; H: html; L: latex
#' @param format output format
#'
#' @return
#' @importFrom stringr str_c
#' @importFrom flextable as_sub as_sup as_b as_paragraph as_i
#' @export
#'
mder <- function(mdstr, format = get_outformat(), modestr = NULL) {
if(length(mdstr) > 1) {
mdstr <- unlist(lapply(mdstr, FUN = function(x) {
mder(x, format = format, modestr = modestr)
}))
return(mdstr)
}
if(grepl("_[^_]*_", mdstr)) {
warning("Don't use _italic_ for italic text, use *italic* instead!")
}
if (format %in% c( "latex", "pdf") || identical(modestr, "L")) {
mdstr <- mdstr %>%
gsub("\\*\\*([^\\*]*)\\*\\*", "\\\\textbf{\\1}", ., perl = T) %>%
gsub("\\*([^\\*]*)\\*", "\\\\textit{\\1}", ., perl = T) %>%
gsub("_([^_]*)_", "\\\\textit{\\1}", ., perl = T) %>%
gsub("%", "\\\\%", ., perl = T) %>%
gsub("\\^([^\\^]*)\\^", "\\\\textsuperscript{\\1}", ., perl = T) %>%
gsub("~([^~]*)~", "\\\\textsubscript{\\1}", ., perl = T) %>%
gsub("\\&", "\\\\&", ., perl = T) %>%
citetoref(mdstr = .)
return(mdstr)
}
if (format %in% c("word", "pandoc") || identical(modestr, "W")) { # word
evalstr <- mdstr %>%
gsub("_([^_]*)_", '", flextable::as_i("\\1"), "', ., perl = T) %>%
gsub("\\*\\*([^\\*]*)\\*\\*", '", flextable::as_b("\\1"), "', ., perl = T) %>%
gsub("\\*([^\\*]*)\\*", '", flextable::as_i("\\1"), "', ., perl = T) %>%
gsub("\\^([^^]*)\\^", '", flextable::as_sup("\\1"), "', ., perl = T) %>%
gsub("\\~([^~]*)\\~", '", flextable::as_sub("\\1"), "', ., perl = T) %>%
# stringr::str_c('"', ., '"') %>%
gsub("\"", "'", ., perl = T) %>%
# gsub("^' , ", "", ., perl = T) %>%
# gsub(", '$", "", ., perl = T) %>%
stringr::str_c("flextable::as_paragraph('", ., "')")
mdstr <- eval(parse(text = evalstr))
return(mdstr)
}
if (format %in% c( "html") || identical(modestr, "H")) { # don't use it for now
mdstr <- mdstr %>%
gsub("\\*\\*([^\\*]*)\\*\\*", "<b>\\1</b>", ., perl = T) %>%
gsub("\\*([^\\*]*)\\*", "<i>\\1</i>", ., perl = T) %>%
gsub("_([^_]*)_", "<i>\\1</i>", ., perl = T) %>%
gsub("\\^([^\\^]*)\\^", "<sup>\\1</sup>", ., perl = T) %>%
gsub("~([^~]*)~", "<sub>\\1</sub>", ., perl = T)
# mdstr <- gsub("\\&", "\\\\&", mdstr, perl = T)
return(mdstr)
}
if (format %in% "gridtext" || identical(modestr, "G")) {
mdstr <- mdstr %>%
gsub("\\*\\*([^\\*]*)\\*\\*", "<sub>\\1<\\\\sub>", ., perl = T) %>%
gsub("_([^_]*)_", "*\\1*", ., perl = T) %>%
gsub("\\^([^\\^]*)\\^", "<sup>\\1<\\\\sup>", ., perl = T) %>%
gsub("~([^~]*)~", "<sub>\\1<\\\\sub>", ., perl = T) %>%
gsub("\\&", "\\\\&", ., perl = T)
}
return(mdstr)
}
#' make caper for pdf and html, but not for word
#'
#' @param ... string to caption
#' @param format output format word; html; latex; gridtext
#' @return caption
#'
#' @export
#'
caper <- function(..., format = get_outformat(), modestr = NULL) {
capstr <- paste(list(...), collapse = "")
return(ifelse(format %in% c("word"), capstr, mder(capstr, format = format)))
}
#' cite to reference for latex output
#'
#' @param mdstr mdstring
#' @param strict if citekey is strictly match like su2015mib or not
#'
#' @return citestring
#' @export
#'
citetoref <- function(mdstr) {
mdstr <- mdstr %>%
gsub("@(\\b([a-z])\\w*([a-z0-9])\\b)", "ref:\\1", ., perl = T) %>%
gsub(";ref:", "_", ., perl = T) %>%
gsub("\\[*\\ *ref:(\\b([a-z])\\w*([a-z0-9])\\b)\\ *\\]*", "(ref:\\1)", ., perl = T)
return(mdstr)
}
#' generate reference for latex output
#'
#' @param bibkey bib keyword
#' @param is_latex
#'
#' @return
#' @importFrom knitr is_latex_output
#' @export
#'
refer <- function (bibkey, is_latex = knitr::is_latex_output()) {
if (isTRUE(is_latex)) {
return(paste0(citetoref(bibkey), " \\ ", bibkey))
}
return("")
}
---
title: "Test citation inside a huxtable table"
output:
bookdown::pdf_book:
latex_engine: xelatex
officedown::rdocx_document:
base_format: "bookdown::word_document2"
bookdown::html_document2:
link-citations: yes
bibliography: Ref.bib
---
# Make data
```{r makedata}
require(huxtable)
require(tidyverse)
require(knitr)
tabdf <- tibble(col1 = 1:4,
col3 = c("[ @bibkey1 ]",
"[ @bibkey2; @bibkey3 ]",
"@bibkey1",
"@bibkey2;@bibkey3"))
See Table \@ref(tab:hux).
The following R code will be excuted if output format is Latex.
r bibkey <- "[ @bibkey1 ]"; refer(bibkey)
tabdf %>% as_hux %>%
taber()
Interesting, many thanks for posting. I’ll have a closer look later on.
Still, it’d be great if huxtable itself could be fixed to work as expected if bookdown "text references" are used and the output format is word/docx.
@mingsu:
Still not working, see it below.
Looks as if row 5 works (bibtex is looking it up but not finding anything, hence outputting a question mark). No?
@hughjonesd
Is it because of the unmentioned bib file?
> cat Ref.bib
@article{bibkey1,
title = {title 1},
author = {Firsta Lasta and Firstb Lastb},
year = 2001,
journal = {Journal},
volume = 01,
pages = {1--2},
number = 1
}
@article{bibkey2,
title = {title 2},
author = {Firsta Lasta and Firstb Lastb},
year = 2002,
journal = {Journal},
volume = 02,
pages = {1--2},
number = 2
}
@article{bibkey3,
title = {title 3},
author = {Firsta Lasta and Firstb Lastb},
year = 2003,
journal = {Journal},
volume = 03,
pages = {1--2},
number = 3
}
@mingsu Could be. Sometimes latex needs multiple runs to pick up bibtex. Not sure how all these command line tools interact :-( but the ? definitely means it is looking for a reference.
Well, for a start, one thing to keep in mind is that bibtex or biblatex are not involved at all in @mingsu’s examples, nor are they in mine. (bibtex or biblatex can be selected by including citation_package: natbib
or citation_package: biblatex
[1] in the YAML metadata header, but unless such a command is explicitly given, the default kicks in, which amounts to citation_package: citeproc
2.)
The question mark results from huxtable (I guess) generating \cite{bibkey3}
, pandoc letting it through as is, and finally latex seeing the \cite{}
but not being able to resolve it.
Let me summarize what I have been able to figure out about this issue so far.
1) The OP tried to include pandoc citekeys inside huxtable cells. This doesn’t work because in order to resolve these citekeys, the cell content would have to be processed by pandoc plus citeproc, but, at least in the case of huxtable, this does not happen.
2) A more powerful markdown(ht)
could work in principle, but it would have to harness the whole power of pandoc plus citeproc, no less. I’m not sure that’s workable.
3) What does work (mostly) is using bookdown "text references". Here, it’s only a label that is included inside an R chunk, e.g. a huxtable. The corresponding text is formatted outside the respective R chunk/huxtable, obviously by pandoc plus citeproc, as citation keys are demonstrably resolved, and subsequently the label is replaced by the formatted output of pandoc plus citeproc.
4) Part of @mingsu’s code seems to partially (?) automate this, replacing [ @bibkey1 ]
in a tab.cap
by (ref:bibkey1)
plus adding, outside of the R chunk, a line (ref:bibkey1) \ [ @bibkey1 ]
. (As seen when inspecting the .knit.md
intermediate file.) When knitting the "DEMO" Rmd to pdf (after inserting the function definitions into the first R chunk), this citekey is successfully resolved in the caption, though this is not the case elsewhere in the huxtable (see screenshot of pdf).
BTW, I don’t think the space chars in `[ @bibkey1 ]` are a good idea, they lead to spurious spaces in the output, too.
5) I wouldn't mind having to manually apply the bookdown "Text references" mechanism every time I want to use a citation key in a huxtable. The only problem from my point of view is that while pdf and html output work as expected as far as the "Text references" mechanism is concerned, word output does not (see my post above).
Just pinging. Any chance the word output bug when using the "Text references" mechanism (as described above) could be fixed?
I suspect that what is going on is the same here as before: huxtable prints out HTML/TeX, or goes direct via officer; kable is doing something else. Reading ?kable
suggests that it doesn't directly print to word, so presumably it is outputting raw markdown tables and then that can be interpreted via pandoc?
I still think that on the whole the best option is the one I suggested at first: write \cite{blah}
and then make sure that your tex to pdf compilation process can deal with citations.
Here is another workaround, which is to output a markdown table:
```{r, results='asis'}
print_md(huxtable("Reference to @citekey"))
```
Of course you don't get all the nice output features for this, but that's the same for other solutions that take this route.
I’m sure I must be missing something here, but I am puzzled why using my MWE, it returns the expected output when knitting with bookdown::pdf_document2
and bookdown::html_document2
but fails to resolve the citation when knitting with bookdown::word_document2
.
Please note that my MWE uses pandoc’s citeproc (a CSL processor) for all output formats, so LaTeX \cite
commands never enter the picture, nor can they ever be part of the solution when the output format is not pdf.
Likely because the mechanism for outputting Word tables is different than for TeX or HTML. Huxtable writes raw TeX and HTML. It writes to Word by using flextable
. If you can make an example work with flextable
, then I might be able to fix this.
I understand that \cite
won't work. Perhaps there is an equivalent raw command for citeproc?
Closing for now because I don't see an obvious fix from my end.
See MWE below
kable
can render citation inside the table, buthuxtable
can't.Kable works
Huxtable does not work
Reference