yihui / knitr

A general-purpose tool for dynamic report generation in R
https://yihui.org/knitr/
2.38k stars 873 forks source link

a tricky problem when the data.frame include ± symbol with kable and bookdown #1695

Closed wikithink closed 1 year ago

wikithink commented 5 years ago

when i use the kable render the table to html with the bookdown::render_book, i find a tricky problem, when the data.frame include ± symbol,all the row data will move left a column. i use the Rstudio preview is normal,but the final html is abnormal like this: image

a minimal example data.frame is following: (1)test.Rmd

# symbol:±
cl <- data.frame(x=c('-','±'),y=c(20,50),z=c('normal','abnormal'))
knitr::kable(cl,booktabs = TRUE)

(2)index.Rmd

--- 
title: "new report for daas"
author: "DAAS"
date: "`r Sys.Date()`"
bibliography:
- book.bib
- packages.bib
description: this is the test book files
documentclass: book
link-citations: yes
site: bookdown::bookdown_site
biblio-style: apalike
---
# test the special symbols

(3)_bookdown.yml

book_filename: "analysis_report"
delete_merged_file: true
language:
  ui:
    chapter_name: "chapter"
rmd_files: ["index.Rmd", "test.Rmd"]

(4)render_book

bookdown::render_book("./index.Rmd",
                      output_dir = "./report",preview = F,clean = F,clean_envir = F,
                      config_file ="_bookdown.yml")

the session_info('knitr') is following:

R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core), RStudio 1.1.463

Locale:
  LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C               LC_TIME=zh_CN.UTF-8       
  LC_COLLATE=zh_CN.UTF-8     LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
  LC_PAPER=zh_CN.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
  LC_TELEPHONE=C             LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

Package version:
  evaluate_0.13   glue_1.3.1      graphics_3.5.1  grDevices_3.5.1 highr_0.8      
  knitr_1.22.5    magrittr_1.5    markdown_0.9    methods_3.5.1   mime_0.6       
  stats_3.5.1     stringi_1.4.3   stringr_1.4.0   tools_3.5.1     utils_3.5.1    
  xfun_0.5        yaml_2.2.0 
wikithink commented 5 years ago

i find when the column is numeric the final html table is normal,but when the column is character or factor ,it will move left.

cderv commented 5 years ago

Can you provide a reproductible example as stated in the issue template you deleted when filling this issue ?

Currently, we are missing some of your variable to try reproduce the issue like cl or dmname. Also, xfun::session_info('knitr') is missing too.

Thanks.

wikithink commented 5 years ago

thanks very much,a minor example data.frame is following:

cl <- data.frame(x=c('-','±'),y=c(20,50),z=c('normal','abnormal'))

the knitr version is 1.22, bookdown is 0.9, the OS is centos 7.6,and i use the pander(cl,style='rmarkdown'),the final html table is right, but kable is abnormal, i find the temp XXX.knit.md file is wrong too, thanks very much!

wikithink commented 5 years ago

someone can help me?about the symbol

wikithink commented 5 years ago

I have found other symbols can lead the same problem,like following: ≤≠¥①②¶™—③№√×↓→↑←‰¾÷㎡αβ℃〖〗©®@&*#%‰$¥§¦※◎≈…≥≦≧

yihui commented 5 years ago

Please follow what @cderv said above (https://github.com/yihui/knitr/issues/1695#issuecomment-477931650) to provide the information required, instead of replying to this issue endlessly.

yihui commented 5 years ago

FWIW, I don't mean you should close the issue (unless you have resolved your issue), but should follow the issue template to provide the required information (such as session info). We cannot simply guess or read your mind. The issue template also asked you to try the dev version of knitr, and I don't know if you really tried it.

wikithink commented 5 years ago

sorry,i misunderstand your meaning of your comment. i will supply the session info immediately. i have transfer these special symbols to ASCII code or html encode,the html table has display normally,but this is not good ideal for the batch processing. thank you!

wikithink commented 5 years ago

after transfer these non-ASCII symbols to ASCII code like this: &#3125; the kable html table display normally. but there are too many non-ASCII symbols, more than 500, replace these one by one is too slowly,may be the stringr::str_replace_all function is more faster.

wikithink commented 5 years ago

update knitr to latest developing version 1.22.8, the problem is same too.

wikithink commented 5 years ago

I have view the source code of tables.R, find the probable cause is the align or padding parameter, after add the align=NULL or align='c' or align="r" , the render html table is normal, but align="l" is get abnormal, finally, the non-numeric columns can not set align="l". but I don't know how to modify the source code to solve the problem, could anyone others give some help to me?

> xfun::session_info('knitr')
R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core), RStudio 1.1.463

Locale:
  LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C              
  LC_TIME=zh_CN.UTF-8        LC_COLLATE=zh_CN.UTF-8    
  LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
  LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
  LC_ADDRESS=C               LC_TELEPHONE=C            
  LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

Package version:
  evaluate_0.13   glue_1.3.1      graphics_3.5.1  grDevices_3.5.1
  highr_0.8       knitr_1.22.8    magrittr_1.5    markdown_0.9   
  methods_3.5.1   mime_0.6        stats_3.5.1     stringi_1.4.3  
  stringr_1.4.0   tools_3.5.1     utils_3.5.1     xfun_0.5       
  yaml_2.2.0 

the sample code like follwing:

library(knitr)
cl <- data.frame(x=c('≧','⑧','♂'),y=c(20,50,100),z=c('[normal]','[abnormal','diff'),stringsAsFactors = F)
knitr::kable(cl,booktabs = TRUE,caption = "kable table 01")
knitr::kable(cl,booktabs = TRUE,caption = "kable table 02",align=NULL)
knitr::kable(cl,booktabs = TRUE,caption = "kable table 03",align="c")
cderv commented 5 years ago

thanks for those infos.

I wonder if it is not something linked to encoding. I tried to reproduce but I cannot manage to have a html output that understand those UTF8 character. Don't know why... 🤔 I got some unicode character <U+xxxx>, but they are aligned correctly.

About your question,

after transfer these non-ASCII symbols to ASCII code like this: &#3125; the kable html table display normally. but there are too many non-ASCII symbols, more than 500, replace these one by one is too slowly,may be the stringr::str_replace_all function is more faster.

You can replace all character using utf8ToInt and build the html code. Ex:

cl <- data.frame(x=c('<U+2267>','<U+2467>','<U+2642>'),y=c(20,50,100),z=c('[normal]','[abnormal','diff'),stringsAsFactors = F)
cl$x <- purrr::map_chr(cl$x, ~ paste0("&#", utf8ToInt(.x), ";"))
cl

Also, I got the feeling this issue is ouput-format dependant (markdown, html, latex, ...). It is not enough clear for me and as I can't manage to reproduce on my setup, I am not sure I can help further.

wikithink commented 5 years ago

@cderv Your prompt reply is greatly appreciated. why you can not render the html files? how about the following data.frame:

library(knitr)
cl <- data.frame(x=c('€','℃','㎡'),y=c(20,50,100),z=c('[normal]','[abnormal','diff'),stringsAsFactors = F)
knitr::kable(cl,booktabs = TRUE,caption = "kable table1",escape=F)
knitr::kable(cl,booktabs = TRUE,caption = "kable table2",align=NULL)
knitr::kable(cl,booktabs = TRUE,caption = "kable table3",align="c")
wikithink commented 5 years ago

I upload a picture, see this: image

cderv commented 5 years ago

why you can not render the html files?

Some encoding issue on my French windows PC.

Anyhow, I am not sure to identify the issue here either. Is this bookdown ? Kable ? encoding ?

You should provide a full reprex with detailed explanation on how to reproduce as a gist or another github repo so that someone jump in an can try to reproduce directly. It is not easy to build a complete reprex based on several replies. Just a piece of advice here.

As I can't print those character correctly by default, I am not in a situation to help you further I guess...

wikithink commented 5 years ago

@cderv thanks , I have partially solved the problem by adding the format = "html",but there are coming a new problem when the rowname or colname or data content of matrix or data.frame including the following three symbols: + - * or prefix including [+-*] ,I have changed the source code of table.R, solved the new problem, but I don't know how to solved when the format="markdown".

knitr::kable(cl,booktabs = TRUE,caption = "kable table2",format = "html",escape = F)

the changed table.R is here:

# [2019-05-22 by wikithink] create a replace function
replace_star_plus_minus = function(x){
  newx <- ifelse(trimws(x)=='+',gsub("\\+","&#43;",x),ifelse(trimws(x)=='-',gsub("\\-","&#45;",x),ifelse(trimws(x)=='*',gsub("\\*","&#42;",x),trimws(x))))
  gsub("^[\\+] +","&#43; ",gsub("^[\\-] +","&#45; ",gsub("^[\\*] +","&#42; ",newx)))
}
kable = function(
  x, format, digits = getOption('digits'), row.names = NA, col.names = NA,
  align, caption = NULL, label = NULL, format.args = list(), escape = FALSE, ...
) {
  # [2019-05-22  by wikithink]treat the col.names
  if(!is.na(col.names)){col.names <- replace_star_plus_minus(col.names)}

  # determine the table format
  if (missing(format) || is.null(format)) format = getOption('knitr.table.format')
  if (is.null(format)) format = if (is.null(pandoc_to())) switch(
    out_format() %n% 'markdown',
    latex = 'latex', listings = 'latex', sweave = 'latex',
    html = 'html', markdown = 'markdown', rst = 'rst',
    stop('table format not implemented yet!')
  ) else if (isTRUE(opts_knit$get('kable.force.latex')) && is_latex_output()) {
    # force LaTeX table because Pandoc's longtable may not work well with floats
    # http://tex.stackexchange.com/q/276699/9128
    'latex'
  } else 'pandoc'
  if (is.function(format)) format = format()

  # expand align if applicable
  if (format != 'latex' && !missing(align) && length(align) == 1L)
    align = strsplit(align, '')[[1]]

  # create a label for bookdown if applicable
  if (is.null(label)) label = opts_current$get('label')
  if (!is.null(caption) && !is.na(caption)) caption = paste0(
    create_label('tab:', label, latex = (format == 'latex')), caption
  )
  if (inherits(x, 'list')) {
    # if the output is for Pandoc and we want multiple tabular in one table, we
    # should use the latex format instead, because Pandoc does not support
    # Markdown in LaTeX yet https://github.com/jgm/pandoc/issues/2453
    if (format == 'pandoc' && is_latex_output()) format = 'latex'
    res = lapply(
      x, kable, format = format, digits = digits, row.names = row.names,
      col.names = col.names, align = align, caption = NA,
      format.args = format.args, escape = escape, ...
    )
    res = unlist(lapply(res, one_string))
    res = if (format == 'latex') {
      kable_latex_caption(res, caption)
    } else if (format == 'html' || (format == 'pandoc' && is_html_output())) kable_html(
      matrix(paste0('\n\n', res, '\n\n'), 1), caption = caption, escape = FALSE,
      table.attr = 'class="kable_wrapper"'
    ) else {
      res = paste(res, collapse = '\n\n')
      if (format == 'pandoc') kable_pandoc_caption(res, caption) else res
    }
    return(structure(res, format = format, class = 'knitr_kable'))
  }
  if (!is.matrix(x)) x = as.data.frame(x)
  # [2019-05-22 by wikithink]replace the three special symbols: * + - 
  if(is.matrix(x)||is.data.frame(x)){
    if(nrow(x)>0){rownames(x) <-  replace_star_plus_minus(rownames(x))}
    if(ncol(x)>0){colnames(x) <-  replace_star_plus_minus(colnames(x))}
  }
  if(!is.matrix(x)){
    if(nrow(x)>0){
      x[,sapply(x, FUN=is.factor)] <- sapply(x[,sapply(x, FUN=is.factor)],FUN=as.vector)
      x[,sapply(x, FUN=is.character)] <- sapply(x[,sapply(x, FUN=is.character)],FUN=replace_star_plus_minus)
      x <- as.data.frame(x,stringsAsFactors=F)
    }
  }else{
    if(is.character(x)){
      x <- replace_star_plus_minus(x)
    }
  }
  # continue...
  if (identical(col.names, NA)) col.names = colnames(x)
  m = ncol(x)
  # numeric columns
  isn = if (is.matrix(x)) rep(is.numeric(x), m) else sapply(x, is.numeric)
  if (missing(align) || (format == 'latex' && is.null(align)))
    align = ifelse(isn, 'r', 'l')
  # rounding
  digits = rep(digits, length.out = m)
  for (j in seq_len(m)) {
    if (is_numeric(x[, j])) x[, j] = round(x[, j], digits[j])
  }
  if (any(isn)) {
    if (is.matrix(x)) {
      if (is.table(x) && length(dim(x)) == 2) class(x) = 'matrix'
      x = format_matrix(x, format.args)
    } else x[, isn] = format_args(x[, isn], format.args)
  }
  if (is.na(row.names)) row.names = has_rownames(x)
  if (!is.null(align)) align = rep(align, length.out = m)
  if (row.names) {
    x = cbind(' ' = rownames(x), x)
    if (!is.null(col.names)) col.names = c(' ', col.names)
    if (!is.null(align)) align = c('l', align)  # left align row names
  }
  n = nrow(x)
  x = replace_na(to_character(as.matrix(x)), is.na(x))
  if (!is.matrix(x)) x = matrix(x, nrow = n)
  x = trimws(x)
  colnames(x) = col.names
  if (format != 'latex' && length(align) && !all(align %in% c('l', 'r', 'c')))
    stop("'align' must be a character vector of possible values 'l', 'r', and 'c'")
  attr(x, 'align') = align
  res = do.call(
    paste('kable', format, sep = '_'),
    list(x = x, caption = caption, escape = escape, ...)
  )
  structure(res, format = format, class = 'knitr_kable')
}

# convert to character while preserving dim/dimnames attributes
to_character = function(x) {
  if (is.character(x)) return(x)
  x2 = as.character(x); dim(x2) = dim(x); dimnames(x2) = dimnames(x)
  x2
}

# as.data.frame() does not allow duplicate row names (#898)
format_matrix = function(x, args) {
  nms = rownames(x)
  rownames(x) = NULL
  x = as.matrix(format_args(as.data.frame(x), args))
  rownames(x) = nms
  x
}

format_args = function(x, args = list()) {
  args$x = x
  args$trim = TRUE
  replace_na(do.call(format, args), is.na(x))
}

replace_na = function(x, which = is.na(x), to = getOption('knitr.kable.NA')) {
  if (is.null(to)) return(x)
  x[which] = to
  x
}

has_rownames = function(x) {
  !is.null(rownames(x)) && !identical(rownames(x), as.character(seq_len(NROW(x))))
}

#' @export
print.knitr_kable = function(x, ...) {
  if (!(attr(x, 'format') %in% c('html', 'latex'))) cat('\n\n')
  cat(x, sep = '\n')
}

#' @export
knit_print.knitr_kable = function(x, ...) {
  x = one_string(c(
    if (!(attr(x, 'format') %in% c('html', 'latex'))) c('', ''), x, '\n'
  ))
  asis_output(x)
}

kable_latex = function(
  x, booktabs = FALSE, longtable = FALSE, valign = 't', centering = TRUE,
  vline = getOption('knitr.table.vline', if (booktabs) '' else '|'),
  toprule = getOption('knitr.table.toprule', if (booktabs) '\\toprule' else '\\hline'),
  bottomrule = getOption('knitr.table.bottomrule', if (booktabs) '\\bottomrule' else '\\hline'),
  midrule = getOption('knitr.table.midrule', if (booktabs) '\\midrule' else '\\hline'),
  linesep = if (booktabs) c('', '', '', '', '\\addlinespace') else '\\hline',
  caption = NULL, caption.short = '', table.envir = if (!is.null(caption)) 'table',
  escape = TRUE
) {
  if (!is.null(align <- attr(x, 'align'))) {
    align = paste(align, collapse = vline)
    align = paste0('{', align, '}')
  }
  centering = if (centering && !is.null(caption)) '\n\\centering'
  # vertical align only if 'caption' is not NULL (may be NA) or 'valign' has
  # been explicitly specified
  valign = if ((!is.null(caption) || !missing(valign)) && valign != '') {
    sprintf('[%s]', valign)
  } else ''
  if (identical(caption, NA)) caption = NULL
  env1 = sprintf('\\begin{%s}%s\n', table.envir, valign)
  env2 = sprintf('\n\\end{%s}',   table.envir)
  if (caption.short != '') caption.short = paste0('[', caption.short, ']')
  cap = if (is.null(caption)) '' else sprintf('\n\\caption%s{%s}', caption.short, caption)

  if (nrow(x) == 0) midrule = ""

  linesep = if (nrow(x) > 1) {
    c(rep(linesep, length.out = nrow(x) - 1), '')
  } else rep('', nrow(x))
  linesep = ifelse(linesep == "", linesep, paste0('\n', linesep))

  if (escape) x = escape_latex(x)
  if (!is.character(toprule)) toprule = NULL
  if (!is.character(bottomrule)) bottomrule = NULL
  tabular = if (longtable) 'longtable' else 'tabular'

  paste(c(
    if (!longtable) c(env1, cap, centering),
    sprintf('\n\\begin{%s}', tabular), align,
    if (longtable && cap != '') c(cap, '\\\\'),
    sprintf('\n%s', toprule), '\n',
    if (!is.null(cn <- colnames(x))) {
      if (escape) cn = escape_latex(cn)
      paste0(paste(cn, collapse = ' & '), sprintf('\\\\\n%s\n', midrule))
    },
    one_string(apply(x, 1, paste, collapse = ' & '), sprintf('\\\\%s', linesep), sep = ''),
    sprintf('\n%s', bottomrule),
    sprintf('\n\\end{%s}', tabular),
    if (!longtable) env2
  ), collapse = '')
}

kable_latex_caption = function(x, caption) {
  paste(c(
    '\\begin{table}\n', sprintf('\\caption{%s}\n', caption), x, '\n\\end{table}'
  ), collapse = '')
}

kable_html = function(x, table.attr = '', caption = NULL, escape = TRUE, ...) {
  table.attr = trimws(table.attr)
  # need a space between <table and attributes
  if (nzchar(table.attr)) table.attr = paste('', table.attr)
  align = if (is.null(align <- attr(x, 'align'))) '' else {
    sprintf(' style="text-align:%s;"', c(l = 'left', c = 'center', r = 'right')[align])
  }
  if (identical(caption, NA)) caption = NULL
  cap = if (length(caption)) sprintf('\n<caption>%s</caption>', caption) else ''
  if (escape) x = escape_html(x)
  one_string(c(
    sprintf('<table%s>%s', table.attr, cap),
    if (!is.null(cn <- colnames(x))) {
      if (escape) cn = escape_html(cn)
      c(' <thead>', '  <tr>', sprintf('   <th%s> %s </th>', align, cn), '  </tr>', ' </thead>')
    },
    '<tbody>',
    paste(
      '  <tr>',
      apply(x, 1, function(z) one_string(sprintf('   <td%s> %s </td>', align, z))),
      '  </tr>', sep = '\n'
    ),
    '</tbody>',
    '</table>'
  ))
}

#' Generate tables for Markdown and reST
#'
#' This function provides the basis for Markdown and reST tables.
#' @param x The data matrix.
#' @param sep.row A length-3 character vector, specifying separators to be printed
#'   before the header, after the header, and at the end of the table respectively.
#' @param sep.col The column separator.
#' @param padding Number of spaces for the table cell padding.
#' @param align.fun A function to process the separator under the header
#'   according to the alignment.
#' @return A character vector of the table content.
#' @noRd
kable_mark = function(x, sep.row = c('=', '=', '='), sep.col = '  ', padding = 0,
                      align.fun = function(s, a) s, rownames.name = '', ...) {
  # when the column separator is |, replace existing | with its HTML entity
  if (sep.col == '|') for (j in seq_len(ncol(x))) {
    x[, j] = gsub('\\|', '&#124;', x[, j])
  }
  l = if (prod(dim(x)) > 0) apply(x, 2, function(z) max(nchar(z, type = 'width'), na.rm = TRUE))
  cn = colnames(x)
  if (length(cn) > 0) {
    cn[is.na(cn)] = "NA"
    if (sep.col == '|') cn = gsub('\\|', '&#124;', cn)
    if (grepl('^\\s*$', cn[1L])) cn[1L] = rownames.name  # no empty cells for reST
    l = pmax(if (length(l) == 0) 0 else l, nchar(cn, type = 'width'))
  }
  align = attr(x, 'align')
  padding = padding * if (length(align) == 0) 2 else {
    ifelse(align == 'c', 2, 1)
  }
  l = pmax(l + padding, 3)  # at least of width 3 for Github Markdown
  s = unlist(lapply(l, function(i) paste(rep(sep.row[2], i), collapse = '')))
  res = rbind(if (!is.na(sep.row[1])) s, cn, align.fun(s, align),
              x, if (!is.na(sep.row[3])) s)
  apply(mat_pad(res, l, align), 1, paste, collapse = sep.col)
}

kable_rst = function(x, rownames.name = '\\', ...) {
  kable_mark(x, rownames.name = rownames.name)
}

# actually R Markdown
kable_markdown = function(x, padding = 1, ...) {
  if (is.null(colnames(x))) {
    warning('The table should have a header (column names)')
    colnames(x) = rep('', ncol(x))
  }
  res = kable_mark(x, c(NA, '-', NA), '|', padding, align.fun = function(s, a) {
    if (is.null(a)) return(s)
    r = c(l = '^.', c = '^.|.$', r = '.$')
    for (i in seq_along(s)) {
      s[i] = gsub(r[a[i]], ':', s[i])
    }
    s
  }, ...)
  sprintf('|%s|', res)
}

kable_pandoc = function(x, caption = NULL, padding = 1, ...) {
  # pandoc's table format cannot create 1-column or 0-row tables
  tab = if (ncol(x) == 1 || nrow(x) == 0) kable_markdown(
    x, padding = padding, ...
  ) else kable_mark(
    x, c(NA, '-', if (is_blank(colnames(x))) '-' else NA),
    padding = padding, ...
  )
  kable_pandoc_caption(tab, caption)
}

kable_pandoc_caption = function(x, caption) {
  if (identical(caption, NA)) caption = NULL
  if (length(caption)) c(paste('Table:', caption), "", x) else x
}

# pad a matrix
mat_pad = function(m, width, align = NULL) {
  n = nrow(m); p = ncol(m)
  res = matrix('', nrow = n, ncol = p)
  if (n * p == 0) return(res)
  stopifnot(p == length(width))
  side = rep('both', p)
  if (!is.null(align)) side = c(l = 'right', c = 'both', r = 'left')[align]
  apply(m, 2, function(x) max(nchar(x, 'width') - nchar(x, 'chars')))
  matrix(pad_width(c(m), rep(width, each = n), rep(side, each = n)), ncol = p)
}

# pad a character vector to width (instead of number of chars), considering the
# case of width > chars (e.g. CJK chars)
pad_width = function(x, width, side) {
  if (!all(side %in% c('left', 'right', 'both')))
    stop("'side' must be 'left', 'right', or 'both'")
  w = width - nchar(x, 'width')
  w1 = floor(w / 2)  # the left half of spaces when side = 'both'
  s1 = v_spaces(w * (side == 'left') + w1 * (side == 'both'))
  s2 = v_spaces(w * (side == 'right') + (w - w1) * (side == 'both'))
  paste0(s1, x, s2)
}

# vectorized over n to generate sequences of spaces
v_spaces = function(n) {
  unlist(lapply(n, highr:::spaces))
}
cderv commented 3 years ago

@wikithink do you still have issue with all this ?

It was not easy for us to reproduce and know for sure what is happening. I am thinking of closing this issue and start fresh in a new issue where you would provide a reproducible example for us.

What do you think ?

Thank you.

github-actions[bot] commented 6 months ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.