rstudio / htmltools

Tools for HTML generation and output
https://rstudio.github.io/htmltools/
215 stars 69 forks source link

FR: Convert HTML to tags #287

Open thothal opened 3 years ago

thothal commented 3 years ago

Motivation

Suppose you use a function from a package which returns an HTML string. You can use this string with htmltools via HTML, but the result is not really satisfactory as the following example shows:

library(htmltools)
tagAppendAttributes(tags$a(), class = "test") ## works
# <a class="test"></a>
tagAppendAttributes(HTML("<a></a>"), class = "test") ## does not work
# Error: $ operator is invalid for atomic vectors

Current workaround requires falling back to string replacement, which is cumbersome and error prone.

Thus, a converter / parser which transforms HTML strings into tags would be tremendously helpful.

Proposal

I found a small shiny app, which used library(XML) to do this sort of parsing and based on this I came up with the following lines:

library(XML)
parse_attributes <- function(node) {
  attribs <- XML::xmlAttrs(node)
  ## unfortunately xmlAttrs returns attributes without a value with a value, namely 
  ## the name of the attribute, itself => dirty hack: compare if name and value are the same
  attribs[names(attribs) == attribs] <- NA
  as.list(attribs)
}

parse_node <- function(node) {
  tag_name <- XML::xmlName(node)
  if (tag_name == "text") {
    value <- trimws(XML::xmlValue(node))
    if (nchar(value) > 0) {
      value
    } else {
      NULL
    }
  } else if (tag_name != "comment") {
    attr <- parse_attributes(node)
    children <- lapply(XML::xmlChildren(node, addNames = FALSE), parse_node)
    children <- Filter(Negate(is.null), children)
    args <- c(attr, children)
    if (tag_name %in% names(htmltools::tags)) {
      fn <- htmltools::tags[[tag_name]]
    } else {
      warning("unknown HTML tag <", tag_name, ">",
              domain = NA)
      fn <- htmltools::tag
      args <- list(`_tag_name` = tag_name, varArgs = args)
    }
    do.call(fn, args)
  }
}

html_to_tags <- function(html_string) {
  ## this function is inspired by https://github.com/alandipert/html2r/blob/master/app.R
  xml <- XML::htmlParse(htmltools::div(id = "parse_me", htmltools::HTML(html_string)))
  elements <- XML::getNodeSet(xml, "//div[@id='parse_me']/*")
  wrap <- if (length(elements) > 1) htmltools::tagList else identity
  do.call(wrap,
         lapply(elements, parse_node))

}

Some simple tests

(x <- div(class = "test", disabled = NA, `data-non-syntactiv-name` = TRUE))
# <div class="test" disabled data-non-syntactiv-name="TRUE"></div>
html_to_tags(as.character(x))
# <div class="test" disabled data-non-syntactiv-name="TRUE"></div>
str(x)
# List of 3
#  $ name    : chr "div"
#  $ attribs :List of 3
#   ..$ class                  : chr "test"
#   ..$ disabled               : logi NA
#   ..$ data-non-syntactiv-name: logi TRUE
#  $ children: list()
#  - attr(*, "class")= chr "shiny.tag"
str(html_to_tags(as.character(x)))
# List of 3
#  $ name    : chr "div"
#  $ attribs :List of 3
#   ..$ class                  : chr "test"
#   ..$ disabled               : chr NA
#   ..$ data-non-syntactiv-name: chr "TRUE"
#  $ children: list()
#  - attr(*, "class")= chr "shiny.tag"

(x <- tagList(div("Test", p(tags$strong("Test"))), p("Test")))
# <div>
#   Test
#   <p>
#     <strong>Test</strong>
#   </p>
# </div>
# <p>Test</p>
html_to_tags(as.character(x))
# <div>
#   Test
#   <p>
#     <strong>Test</strong>
#   </p>
# </div>
# <p>Test</p>

Granted, the function is not injective, thus, it does not (yet) guarnatee x == html_to_tags(as.character(x)), but from what I can judge (not being an expert in HTML and all its specifities), the resulting HTML should be rather equivalent.

What do you think, would it make sense to include such a funciton into htmltools?


Update Using {xml2}

library(xml2)
parse_attributes <- function(node) {
  attribs <- as.list(xml2::xml_attrs(node))
  ## unfortunately xmlAttrs returns attributes without a value with a value
  ## the name of the attribute, dirty hack: compare if name and value are the same
  attribs[names(attribs) == attribs] <- NA
  attribs
}

parse_node <- function(node) {
  tag_name <- xml2::xml_name(node)
  if (tag_name == "text") {
    value <- trimws(xml2::xml_text(node))
    if (nchar(value) > 0) {
      value
    } else {
      NULL
    }
  } else if (tag_name != "comment") {
    attr <- parse_attributes(node)
    children <- lapply(xml2::xml_contents(node), parse_node)
    children <- Filter(Negate(is.null), children)
    args <- c(attr, children)
    if (tag_name %in% names(htmltools::tags)) {
      fn <- htmltools::tags[[tag_name]]
    } else {
      warning("unknown HTML tag <", tag_name, ">",
              domain = NA)
      fn <- htmltools::tag
      args <- list(`_tag_name` = tag_name, varArgs = args)
    }
    do.call(fn, args)
  }
}

html_to_tags <- function(html_string) {
  ## this function is inspired by https://github.com/alandipert/html2r/blob/master/app.R
  xml <- xml2::read_html(as.character(htmltools::div(id = "parse_me",
                                                     htmltools::HTML(html_string))))
  elements <- xml2::xml_find_all(xml, "//div[@id='parse_me']/*")
  wrap <- if (length(elements) > 1) htmltools::tagList else identity
  do.call(wrap,
         lapply(elements, parse_node))

}
cpsievert commented 3 years ago

What do you think, would it make sense to include such a funciton into htmltools?

Perhaps someday, but I don't think we'll have the incentive/bandwidth to add it for awhile. That said, preferably the logic would depend on {xml2}, not {XML}

thothal commented 3 years ago

OK, thanks for the feedback. Would a PR with some code like this ({xml2} based) stand a chance to be included?