hughjonesd / huxtable

An R package to create styled tables in multiple output formats, with a friendly, modern interface.
http://hughjonesd.github.io/huxtable
Other
322 stars 28 forks source link

Citation inside a Huxtable table isn't rendered #212

Closed mingsu closed 1 year ago

mingsu commented 3 years ago

See MWE below

kable can render citation inside the table, but huxtable can't.

---
title: "Test citation inside a huxtable table"
output:
  bookdown::pdf_book:
    latex_engine: xelatex
  officedown::rdocx_document:
    base_format: "bookdown::word_document2"
  bookdown::html_document2:
link-citations: yes
bibliography: Ref.bib
---

# Make data

```{r makedata}
require(huxtable)
require(tidyverse)
require(knitr)
tabdf <- tibble(col1 = 1:4,
         col3 = c("[ @bibkey1 ]",
                  "[ @bibkey2; @bibkey3 ]",
                  "@bibkey1",
                  "@bibkey2;@bibkey3"))

Kable works

kable(tabdf, format = "pipe", caption = "Kable table")

Huxtable does not work

tabdf %>% as_hux %>%
    set_caption("Hux table")

Reference


````sh
> cat Ref.bib

@article{bibkey1,
    title       = {title 1},
    author      = {Firsta Lasta and Firstb Lastb},
    year        = 2001,
    journal     = {Journal},
    volume      = 01,
    pages       = {1--2},
    number      = 1
}

@article{bibkey2,
    title       = {title 2},
    author      = {Firsta Lasta and Firstb Lastb},
    year        = 2002,
    journal     = {Journal},
    volume      = 02,
    pages       = {1--2},
    number      = 2
}

@article{bibkey3,
    title       = {title 3},
    author      = {Firsta Lasta and Firstb Lastb},
    year        = 2003,
    journal     = {Journal},
    volume      = 03,
    pages       = {1--2},
    number      = 3
}
hughjonesd commented 3 years ago

I think this is because huxtable outputs raw TeX or HTML, whereas kable outputs markdown. The solution is to output TeX citations (\cite{bibkey3}) rather than markdown citations (@bibkey3), and to use escape_contents(). Can you confirm if this works for you?

mingsu commented 3 years ago

Still not working, see it below.

---
title: "Test citation inside a huxtable table"
output:
  bookdown::pdf_book:
    latex_engine: xelatex
  officedown::rdocx_document:
    base_format: "bookdown::word_document2"
  bookdown::html_document2:
link-citations: yes
bibliography: Ref.bib
---

# Make data

```{r makedata}
require(huxtable)
require(tidyverse)
require(knitr)
tabdf <- tibble(col1 = 1:6,
         col3 = c("[ @bibkey1 ]",
                  "[ @bibkey2; @bibkey3 ]",
                  "@bibkey1",
                  "@bibkey2;@bibkey3",
                  "\\cite{bibkey3}",
                  "\\\\cite{bibkey3}"
                  ))

Huxtable does not work

tabdf %>% as_hux %>%
    set_caption("Hux table") %>%
    set_escape_contents(5:6, 2, F)

Reference



<img width="549" alt="image" src="https://user-images.githubusercontent.com/3145794/128596217-b4d4db99-769e-4e3b-a726-1cca3f9b9d8e.png">
njbart commented 3 years ago

You could try using bookdown "text references". - Example:

---
output: 
  bookdown::pdf_document2:
    latex_engine: xelatex
  html_document: default
  bookdown::html_document2: default
references:
- id: eurostat
  author: Eurostat
  issued: 2021
  title: Database - Employment and unemployment (LFS)
  type: report
  URL: 'https://ec.europa.eu/eurostat/web/lfs/data/database'
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(huxtable)

Some text blah blah [@eurostat, 88-99].

(ref:eurostat) Data: @eurostat [88-99]

tribble_hux(
  ~ Name,             ~ Source,
    "John Smith",       "@eurostat",
    "Jane Doe",         "(ref:eurostat)",
) %>% 
  set_caption("A caption") %>% 
  add_footnote("(ref:eurostat)")


pdf output:

![Screenshot 2021-08-27 at 17 02 59](https://user-images.githubusercontent.com/2765811/131148392-e66d4c72-f335-4368-bc9c-9a6012c3bfe0.png)

Knitting with `bookdown::html_document2` works, too.
njbart commented 3 years ago

Interestingly enough, this does not work as expected if the output format is word (input file as above, with bookdown::word_document2: default added):

Screenshot 2021-08-30 at 12 49 35

Note that the "label" (ref:eurostat) is (correctly) replaced by its corresponding "text" Data: @eurostat [88-99], but the "text" shows no signs of having been processed by pandoc (plus citeproc) before that. Other markdown tags such as **bold** come through unprocessed as well (not shown in example).

It does work as expected in all output formats, including word, if using kable.

Any ideas on what huxtable does differently here? Should I open a new issue?

mingsu commented 3 years ago

I have made several complex functions to make it work. Adjust them to your cases.


#' get output format
#'
#' @return
#' @importFrom knitr is_html_output is_latex_output pandoc_to
#' @export
#'
#' 
get_outformat <- function(){
  if (knitr::is_html_output()) {
    return("html")
  } 
  if (knitr::is_latex_output()) {
    return("latex")
  } 
  if (knitr::pandoc_to("docx")){
    return("word")
  } 
  return("markdown")
}

#' make valid huxtable for specified output format 
#'
#' @param ht huxtable
#' @param id table id key
#' @param capstr table caption string
#' @param latexfloat latex float option
#' @param format output format
#'
#' @return
#' @importFrom knitr opts_current
#' @importFrom huxtable as_flextable set_latex_float set_caption set_label theme_article
#' @importFrom dplyr mutate_all
#' @importFrom flextable compose
#' @importFrom stringr str_replace_all
#' @export
#'
taber <- function(ht,
                  id = knitr::opts_current$get()$tab.id,
                  capstr = knitr::opts_current$get()$tab.cap,
                  latexfloat = "htbp",
                  format = get_outformat()) {
    if(!("huxtable" %in% class(ht))) {
        stop("`ht` should be huxtable!")
    }
    if (is.null(capstr)) { capstr <- "" }
    if (is.null(id)) { id <- paste0("temp", abs(floor(rnorm(n = 1) * 10000))) }
    if (format %in% c( "latex")) {
        capstr <- capstr %>%
            stringr::str_replace_all(., "%", "\\\\%") %>%
            stringr::str_replace_all(., "~([^~]*)~", "\\\\textsubscript{\\1}") %>%
            stringr::str_replace_all(., "\\*\\*([^\\*]*)\\*\\*", "\\\\textbf{\\1}") %>%
            stringr::str_replace_all(., "\\*([^\\*]*)\\*", "\\\\textit{\\1}") %>%
            stringr::str_replace_all(., "_([^_]*)_", "\\\\textit{\\1}") %>%
            stringr::str_replace_all(., "\\^([^\\^]*)\\^", "\\\\textsuperscript{\\1}")
        ht <- ht %>%
            dplyr::mutate_all(~stringr::str_replace_all(., "%", "\\\\%")) %>%
            dplyr::mutate_all(~stringr::str_replace_all(., "~([^~]*)~", "\\\\textsubscript{\\1}")) %>%
            dplyr::mutate_all(~stringr::str_replace_all(., "\\*\\*([^\\*]*)\\*\\*", "\\\\textbf{\\1}")) %>%
            dplyr::mutate_all(~stringr::str_replace_all(., "\\*([^\\*]*)\\*", "\\\\textit{\\1}")) %>%
            dplyr::mutate_all(~stringr::str_replace_all(., "_([^_]*)_", "\\\\textit{\\1}")) %>%
            dplyr::mutate_all(~stringr::str_replace_all(., "\\^([^\\^]*)\\^", "\\\\textsuperscript{\\1}")) %>%
            huxtable::set_escape_contents(F) %>%
            huxtable::set_latex_float(value = latexfloat) %>%
            huxtable::set_caption(capstr) %>%
            huxtable::set_label(paste0("tab:", id))
        return(ht)
    }
    # for word
    if (format %in% c("html", "word", "pandoc")) {
        ft <- ht %>%
            huxtable::as_flextable()
        for (irow in 1:nrow(ht)) {
            for (icol in 1:ncol(ht)) {
                ft <- ft %>%
                    flextable::compose(part = "body", i = irow, j = icol,
                                       value = mder(as.character(ht[irow, icol]), format = "word"))
            }
        }
        return(ft)
    }
    ht <- ht %>%
        huxtable::set_caption(capstr) %>%
        huxtable::set_label(id)
    return(ht)
}

#' turn string into specified format
#'
#' @param mdstr markdown syntax string
#' @param modestr mode to use, M: markdwon, G: gridtext; W: word; H: html; L: latex
#' @param format output format
#'
#' @return
#' @importFrom stringr str_c
#' @importFrom flextable as_sub as_sup as_b as_paragraph as_i
#' @export
#'
mder <- function(mdstr, format = get_outformat(), modestr = NULL) {
    if(length(mdstr) > 1) {
        mdstr <- unlist(lapply(mdstr, FUN = function(x) {
            mder(x, format = format, modestr = modestr)
        }))
        return(mdstr)
    }
    if(grepl("_[^_]*_", mdstr)) {
        warning("Don't use _italic_ for italic text, use *italic* instead!")
    }
    if (format %in% c( "latex", "pdf") || identical(modestr, "L")) {
        mdstr <- mdstr %>%
            gsub("\\*\\*([^\\*]*)\\*\\*", "\\\\textbf{\\1}", ., perl = T) %>%
            gsub("\\*([^\\*]*)\\*", "\\\\textit{\\1}", ., perl = T) %>%
            gsub("_([^_]*)_", "\\\\textit{\\1}", ., perl = T) %>%
            gsub("%", "\\\\%", ., perl = T) %>%
            gsub("\\^([^\\^]*)\\^", "\\\\textsuperscript{\\1}", ., perl = T) %>%
            gsub("~([^~]*)~", "\\\\textsubscript{\\1}", ., perl = T) %>%
            gsub("\\&", "\\\\&", ., perl = T) %>%
            citetoref(mdstr = .)
        return(mdstr)
    }
    if (format %in% c("word", "pandoc") || identical(modestr, "W")) { # word
        evalstr <- mdstr %>%
            gsub("_([^_]*)_", '", flextable::as_i("\\1"), "', ., perl = T) %>%
            gsub("\\*\\*([^\\*]*)\\*\\*", '", flextable::as_b("\\1"), "', ., perl = T) %>%
            gsub("\\*([^\\*]*)\\*", '", flextable::as_i("\\1"), "', ., perl = T) %>%
            gsub("\\^([^^]*)\\^", '", flextable::as_sup("\\1"), "', ., perl = T) %>%
            gsub("\\~([^~]*)\\~", '", flextable::as_sub("\\1"), "', ., perl = T) %>%
            # stringr::str_c('"', ., '"') %>% 
            gsub("\"", "'", ., perl = T) %>%
            # gsub("^' , ", "", ., perl = T) %>%
            # gsub(",  '$", "", ., perl = T) %>%
            stringr::str_c("flextable::as_paragraph('", ., "')")
        mdstr <- eval(parse(text = evalstr))
        return(mdstr)
    }
    if (format %in% c( "html") || identical(modestr, "H")) {    # don't use it for now
        mdstr <- mdstr %>%
            gsub("\\*\\*([^\\*]*)\\*\\*", "<b>\\1</b>", ., perl = T) %>%
            gsub("\\*([^\\*]*)\\*", "<i>\\1</i>", ., perl = T) %>%
            gsub("_([^_]*)_", "<i>\\1</i>", ., perl = T) %>%
            gsub("\\^([^\\^]*)\\^", "<sup>\\1</sup>", ., perl = T) %>%
            gsub("~([^~]*)~", "<sub>\\1</sub>", ., perl = T)
        # mdstr <- gsub("\\&", "\\\\&", mdstr, perl = T)
        return(mdstr)
    }
    if (format %in% "gridtext" || identical(modestr, "G")) {
        mdstr <- mdstr %>%
            gsub("\\*\\*([^\\*]*)\\*\\*", "<sub>\\1<\\\\sub>", ., perl = T) %>%
            gsub("_([^_]*)_", "*\\1*", ., perl = T) %>%
            gsub("\\^([^\\^]*)\\^", "<sup>\\1<\\\\sup>", ., perl = T) %>%
            gsub("~([^~]*)~", "<sub>\\1<\\\\sub>", ., perl = T) %>%
            gsub("\\&", "\\\\&", ., perl = T)
    }
    return(mdstr)
}

#' make caper for pdf and html, but not for word
#'
#' @param ... string to caption
#' @param format output format word; html; latex; gridtext
#' @return caption
#'
#' @export
#'
caper <- function(..., format = get_outformat(), modestr = NULL) {
    capstr <- paste(list(...), collapse = "")
    return(ifelse(format %in% c("word"), capstr, mder(capstr, format = format)))
}

#' cite to reference for latex output
#'
#' @param mdstr mdstring 
#' @param strict if citekey is strictly match like su2015mib or not
#'
#' @return citestring
#' @export
#'
citetoref <- function(mdstr) {
    mdstr <- mdstr %>%
        gsub("@(\\b([a-z])\\w*([a-z0-9])\\b)", "ref:\\1", ., perl = T) %>%
        gsub(";ref:", "_", ., perl = T) %>%
        gsub("\\[*\\ *ref:(\\b([a-z])\\w*([a-z0-9])\\b)\\ *\\]*", "(ref:\\1)", ., perl = T)
    return(mdstr)
}

#' generate reference for latex output
#'
#' @param bibkey bib keyword
#' @param is_latex 
#'
#' @return
#' @importFrom knitr is_latex_output
#' @export
#'
refer <- function (bibkey, is_latex = knitr::is_latex_output()) {
    if (isTRUE(is_latex)) {
        return(paste0(citetoref(bibkey), " \\ ", bibkey))
    }
    return("")
}

DEMO


---
title: "Test citation inside a huxtable table"
output:
  bookdown::pdf_book:
    latex_engine: xelatex
  officedown::rdocx_document:
    base_format: "bookdown::word_document2"
  bookdown::html_document2:
link-citations: yes
bibliography: Ref.bib
---

# Make data

```{r makedata}
require(huxtable)
require(tidyverse)
require(knitr)
tabdf <- tibble(col1 = 1:4,
         col3 = c("[ @bibkey1 ]",
                  "[ @bibkey2; @bibkey3 ]",
                  "@bibkey1",
                  "@bibkey2;@bibkey3"))

Making Huxtable to work

See Table \@ref(tab:hux).

The following R code will be excuted if output format is Latex.

r bibkey <- "[ @bibkey1 ]"; refer(bibkey)

tabdf %>% as_hux %>%
    taber()

Reference

njbart commented 3 years ago

Interesting, many thanks for posting. I’ll have a closer look later on.

Still, it’d be great if huxtable itself could be fixed to work as expected if bookdown "text references" are used and the output format is word/docx.

hughjonesd commented 3 years ago

@mingsu:

Still not working, see it below.

Looks as if row 5 works (bibtex is looking it up but not finding anything, hence outputting a question mark). No?

mingsu commented 3 years ago

@hughjonesd

Is it because of the unmentioned bib file?

> cat Ref.bib

@article{bibkey1,
    title       = {title 1},
    author      = {Firsta Lasta and Firstb Lastb},
    year        = 2001,
    journal     = {Journal},
    volume      = 01,
    pages       = {1--2},
    number      = 1
}

@article{bibkey2,
    title       = {title 2},
    author      = {Firsta Lasta and Firstb Lastb},
    year        = 2002,
    journal     = {Journal},
    volume      = 02,
    pages       = {1--2},
    number      = 2
}

@article{bibkey3,
    title       = {title 3},
    author      = {Firsta Lasta and Firstb Lastb},
    year        = 2003,
    journal     = {Journal},
    volume      = 03,
    pages       = {1--2},
    number      = 3
}
hughjonesd commented 3 years ago

@mingsu Could be. Sometimes latex needs multiple runs to pick up bibtex. Not sure how all these command line tools interact :-( but the ? definitely means it is looking for a reference.

njbart commented 3 years ago

Well, for a start, one thing to keep in mind is that bibtex or biblatex are not involved at all in @mingsu’s examples, nor are they in mine. (bibtex or biblatex can be selected by including citation_package: natbib or citation_package: biblatex [1] in the YAML metadata header, but unless such a command is explicitly given, the default kicks in, which amounts to citation_package: citeproc 2.)

The question mark results from huxtable (I guess) generating \cite{bibkey3}, pandoc letting it through as is, and finally latex seeing the \cite{} but not being able to resolve it.

[1]: https://bookdown.org/yihui/bookdown/citations.html

njbart commented 3 years ago

Let me summarize what I have been able to figure out about this issue so far.

1) The OP tried to include pandoc citekeys inside huxtable cells. This doesn’t work because in order to resolve these citekeys, the cell content would have to be processed by pandoc plus citeproc, but, at least in the case of huxtable, this does not happen.

2) A more powerful markdown(ht) could work in principle, but it would have to harness the whole power of pandoc plus citeproc, no less. I’m not sure that’s workable.

3) What does work (mostly) is using bookdown "text references". Here, it’s only a label that is included inside an R chunk, e.g. a huxtable. The corresponding text is formatted outside the respective R chunk/huxtable, obviously by pandoc plus citeproc, as citation keys are demonstrably resolved, and subsequently the label is replaced by the formatted output of pandoc plus citeproc.

4) Part of @mingsu’s code seems to partially (?) automate this, replacing [ @bibkey1 ] in a tab.cap by (ref:bibkey1) plus adding, outside of the R chunk, a line (ref:bibkey1) \ [ @bibkey1 ]. (As seen when inspecting the .knit.md intermediate file.) When knitting the "DEMO" Rmd to pdf (after inserting the function definitions into the first R chunk), this citekey is successfully resolved in the caption, though this is not the case elsewhere in the huxtable (see screenshot of pdf).

BTW, I don’t think the space chars in `[ @bibkey1 ]` are a good idea, they lead to spurious spaces in the output, too.

Screenshot 2021-09-03 at 19 26 48

5) I wouldn't mind having to manually apply the bookdown "Text references" mechanism every time I want to use a citation key in a huxtable. The only problem from my point of view is that while pdf and html output work as expected as far as the "Text references" mechanism is concerned, word output does not (see my post above).

njbart commented 3 years ago

Just pinging. Any chance the word output bug when using the "Text references" mechanism (as described above) could be fixed?

hughjonesd commented 2 years ago

I suspect that what is going on is the same here as before: huxtable prints out HTML/TeX, or goes direct via officer; kable is doing something else. Reading ?kable suggests that it doesn't directly print to word, so presumably it is outputting raw markdown tables and then that can be interpreted via pandoc?

hughjonesd commented 2 years ago

I still think that on the whole the best option is the one I suggested at first: write \cite{blah} and then make sure that your tex to pdf compilation process can deal with citations.

hughjonesd commented 2 years ago

Here is another workaround, which is to output a markdown table:

```{r, results='asis'}
print_md(huxtable("Reference to @citekey"))
```

Of course you don't get all the nice output features for this, but that's the same for other solutions that take this route.

njbart commented 2 years ago

I’m sure I must be missing something here, but I am puzzled why using my MWE, it returns the expected output when knitting with bookdown::pdf_document2 and bookdown::html_document2 but fails to resolve the citation when knitting with bookdown::word_document2.

Please note that my MWE uses pandoc’s citeproc (a CSL processor) for all output formats, so LaTeX \cite commands never enter the picture, nor can they ever be part of the solution when the output format is not pdf.

hughjonesd commented 2 years ago

Likely because the mechanism for outputting Word tables is different than for TeX or HTML. Huxtable writes raw TeX and HTML. It writes to Word by using flextable. If you can make an example work with flextable, then I might be able to fix this.

I understand that \cite won't work. Perhaps there is an equivalent raw command for citeproc?

hughjonesd commented 1 year ago

Closing for now because I don't see an obvious fix from my end.