Embed complete documents provided to `includeHTML()` in an iframe

This is a proposal for a fix for #124. It appears that there's a long history of users wanting to call includeHTML() to embed arbitrary HTML in a parent document or app, but running into problems when the included HTML is a complete HTML document: https://github.com/rstudio/shiny/issues/2535, https://github.com/rstudio/rmarkdown/issues/1514, https://github.com/rstudio/rmarkdown/issues/792, https://github.com/rstudio/rmarkdown/issues/1514, https://github.com/rstudio/shinydashboard/issues/228, https://github.com/rstudio/shiny/issues/2535, https://github.com/rstudio/htmltools/issues/90.

Our position in the past has been "don't use includeHTML() for this", but we also haven't been proactive about preventing the unexpected errors that will occur when a complete HTML document is inlined within another document.

In the majority of cases where users have been historically tempted by includeHTML() – primarily when they want to include fully rendered R Markdown or Quarto douments or htmlwidgets – inlining the included HTML document source in an <iframe> has been our leading alternative recommendation.

This PR makes that behavior the default. When we detect a complete document, i.e. one that starts with <!DOCTYPE html> and ends with </html>, we place the HTML document contents in the srcdoc attribute of an <iframe>. This lets us ship an iframe without having to ship additional files. (srcdoc is well-supported in browsers.)

Reprex

The following reprex renders an htmlwidget to a standalone HTML file. It also writes some lorem ipsum text in paragraph tags to a HTML fragment. Both are added in the document with includeHTML().

library(shiny)
library(bslib)
library(leaflet)

# A complete html document containing a leaflet map
tmp_map <- tempfile(fileext = ".html")
map <- leaflet() |>
  addTiles() |>
  setView(-93.65, 42.0285, zoom = 17) |>
  addPopups(-93.65, 42.0285, "Here is the <b>Department of Statistics</b>, ISU")

htmlwidgets::saveWidget(map, tmp_map)

# An HTML fragment, not a complete document
tmp_html <- tempfile(fileext = ".html")
writeLines(format(as.tags(lorem::ipsum(3, 3))), tmp_html)

ui <- page_fluid(
  titlePanel("Hello embedded html in Shiny!"),
  card(
    card_header("A leaflet map, fully encapsulated"),
    full_screen = TRUE,
    includeHTML(tmp_map)
  ),
  h3("Just some more HTML"),
  includeHTML(tmp_html)
)

srv <- function(input, output) {}
shinyApp(ui, srv)

I think the API is a bit too magical -- the user needs to know that if their document starts with <!DOCTYPE html> and ends with </html>, then it will be embedded in an iframe; otherwise not. If those conditions aren't met, users could easily end up confused trying to figure out why their html file isn't being put in an iframe.

I think it would better if the user has to explicitly opt-in to the iframe behavior, maybe with an option like iframe=TRUE.

How about an in between....

Let's default iframe = NULL and if a full document is provided, a warning is given prompting the user to use iframe = TRUE. And:

If iframe = FALSE, then the old behavior will always be used.
If iframe = TRUE, then the iframe behavior (with no message) will be used.

I like the prompt as there is no nudge toward the new behavior within an established function.

@wch: I think the API is a bit too magical -- the user needs to know that if their document starts with <!DOCTYPE html> and ends with </html>, then it will be embedded in an iframe; otherwise not. If those conditions aren't met, users could easily end up confused trying to figure out why their html file isn't being put in an iframe.

I think you're flipping the causal arrows. Users who give includeHTML() a complete document are usually just trying to get an HTML thing into the current page. They don't know that adding a complete document inside another HTML document is going to cause problems. Someone who wants to have an <iframe>, knows they should be using an <iframe>, and wants to have complete control over that <iframe> can and should call tags$iframe().

So here we're not asking the user to know about these two conditions as much as we are using their presence to prevent includeHTML() from inlining one complete HTML document in another.

@wch: I think it would better if the user has to explicitly opt-in to the iframe behavior, maybe with an option like iframe=TRUE.

I don't think making it explicit is helpful for the same reason as above; someone who knows that they need to set iframe = TRUE can also use tags$iframe(). Most people who would benefit from this behavior are expecting includeHTML() to do the right thing for them.

@schloerke: Let's default iframe = NULL and if a full document is provided, a warning is given prompting the user to use iframe = TRUE.

I appreciate the compromise, but I also don't think this is very helpful. It doesn't make sense to inline a full HTML document into another document just as much as it doesn't make sense to write an HTML fragment into an <iframe>.

I think part of the problem here is the name of the function: includeHTML(). In other cases, like includeCSS() and includeScript(), these functions take a source file and embed it inside the appropriate tag. The current behavior of includeHTML() is more like inlineHTML(), where we do nothing with the user input other than inline it directly in the document.

I think this PR brings includeHTML() more in line with the rest of the include*() functions, where they generally transform the input in the appropriate way for the document as a convenience to users.

The part that I think will be confusing is that the behavior is very different depending on the content of the file being included, and it won't be obvious to a user why it's happening. Someone might use this function to include HTML documents and expect it to always put things in an iframe. Then they might do any of the following and expect it to also go in an iframe, but it won't:

Include a file that starts with <html>
Include a file that just starts with body content, like <div>

I don't think it's far-fetched that someone would do this, because I often create files like this myself. It is true that these are not be proper HTML5 files, but even so, for users who have developed an expectation that includeHTML() would put stuff in an iframe, it would be surprising if these files didn't end up in an iframe, and it would be non-obvious why the behavior changed.

I think @schloerke idea of using a iframe=NULL default makes sense, where if it detects a full HTML document, it will print out a message recommending iframe=TRUE.

I don't think making it explicit is helpful for the same reason as above; someone who knows that they need to set iframe = TRUE can also use tags$iframe(). Most people who would benefit from this behavior are expecting includeHTML() to do the right thing for them.

It's true that users could use tags$iframe(), but figuring out how to put the content in the srcdoc attribute would be nontrivial for most users.

@gadenbuie I think you lean more toward it doing the thing that the user probably wants, and I lean toward making sure the user understands why it's doing what it's does, and doesn't have any unexpected surprises. I think @schloerke's proposal helps bridge that gap here.

I think that a user who is calling includeHTML() doesn't have much of an expectation about how the HTML is included in the page, just that the function do the most appropriate thing.

the behavior is very different depending on the content of the file being included, and it won't be obvious to a user why it's happening.

This is already the case, and I'd argue it's currently a much worse experience for the user. Yes, it's easier to describe and reason about an includeHTML() that simply inlines any HTML given to it, but the consequence is that there's a long history of users using includeHTML() to do something it currently doesn't support. We currently offer no support to the user when this happens and the ways in which it will fail is also less than obvious.

Someone might use this function to include HTML documents and expect it to always put things in an iframe. Then they might do any of the following and expect it to also go in an iframe, but it won't:

This PR keeps the behavior of includeHTML() exactly as it was, except for the very specific case of trying to include a full HTML document. A user doesn't arrive in this scenario accidentally; they're creating a document via other tooling like R Markdown, Quarto or an external website.

I'd be happy to consider including the criteria that the document starts with <html> OR the doctype declaration, but the goal was to very carefully catch the circumstances where the document is being created via an outside tool.

@gadenbuie I think you lean more toward it doing the thing that the user probably wants, and I lean toward making sure the user understands why it's doing what it's does, and doesn't have any unexpected surprises.

includeHTML() (at least by its name) advertises itself as a way to get an html thing into the current document or app. Trying to include a full document leads to unexpected surprises and includeHTML() currently doesn't do anything to try to help the user understand why it's doing what it does.

My thinking is: it's easy to detect ~90% of the problematic cases with 100% certainty that includeHTML() is the wrong choice in those detected cases.

We then have two options:

We could throw an error and tell users to do something else, keeping includeHTML() simple and consistent.
Or, knowing both the problem and how to solve the problem, we could fix it for the users.

I lean toward helping the user in this case for two reasons. The first is that there is a long history of includeHTML() causing painful friction for our users. It's very clear that people want it to work in a way that it currently doesn't.

The second reason is that, at a technical level, includeHTML() is a final destination. I can't find a precise term, but I mean that includeHTML() does not produce a data structure that users expect to modify or on which users intend to perform further operations. In practice, users are not calling any of the include*() functions with the intention of producing data, they are closer to side-effect functions like write.csv() where the intention is to write HTML, JavaScript or CSS into the document.

To me, that second reason is important. If includeHTML() created an object that required further interaction from the user, then it should prioritize consistency and I would not have proposed this PR. Furthermore, the fact that includeHTML() reads a file from disk makes it an impure function and is additionally important to consider – it doesn't accept and modify a data structure, it reads a file from disk. When that file is written there are likely many unexpected things that happen from the users' perspective – the results of a page*() function become a complete HTML document; their markdown is rendered into a pandoc template, etc.

I think @schloerke idea of using a iframe=NULL default makes sense, where if it detects a full HTML document, it will print out a message recommending iframe=TRUE.

I recognize that this gives the user a clearer signal about how to choose what happens when they call includeHTML(). But I think having this argument requires

iframe = FALSE to be useful when given a complete document where we would want to default to iframe = TRUE
iframe = TRUE to be useful when given an HTML fragment where we would want to default to iframe = FALSE.

Neither case is something our users really want to do (the first is incorrect and the second is useful only to experts). There's already a strong enough signal to have iframe = NULL detect and promote full-document files to an <iframe>.

It's true that users could use tags$iframe(), but figuring out how to put the content in the srcdoc attribute would be nontrivial for most users.

It's easy enough that we can include the code in the error message, possibly even expanding path:

tags$iframe(srcdoc = readLines(path))

How about another option...

What if we have a new method to handle the iframe (includeHTMLDocument()?). Both methods could have error messages to point to each other. BUT, by having different methods (even in the same help doc), then the intent is clearly defined for each method and no magic is performed. (The signature of includeHTML() would not need to change.)

Ex:

htmldoc <- "<html>....</html>"
htmlfrag <- "<div>...</div>"

includeHTML(htmlfrag) # Good
includeHTML(htmldoc)
#> Error: Fully defined html was provided. Please provide an HTML fragment to `includeHTML()` or use `includeHTMLDocument(htmldoc)` to display your HTML in an iframe

includeHTMLDocument(htmldoc) # Good
includeHTMLDocument(htmlfrag) 
#> Error: HTML fragment was provided. Please provide a fully defined HTML document to `includeHTMLDocument()` or use `includeHTML(htmlfrag)` to include fragment HTML

How about another option...

What if we have a new method to handle the iframe (includeHTMLDocument()?).

@schloerke I like this too! I think it nicely avoids the issues pointed out here. I'm going to close this PR and open a new PR with includeHTMLDocument().

rstudio / htmltools

Embed complete documents provided to `includeHTML()` in an iframe #381

Reprex