yihui / knitr

A general-purpose tool for dynamic report generation in R
https://yihui.org/knitr/
2.36k stars 873 forks source link

Using URL for image when converting to PDF output #2274

Closed bayeslearner closed 10 months ago

bayeslearner commented 11 months ago

The following doc works fine in html but fails beamer. The generated latex file contains special characters. Should it be wrapped in url package or something?

\includegraphics{https://cdn.mathpix.com/cropped/2023_07_28_a103dc94c1ec860738c6g-31.jpg?height=443\&width=345\&top_left_y=1857\&top_left_x=867}
---
title: "Understanding Structural Coefficients and Causal Effect in Linear Systems 6/6"
author: "AI Engineer"
date: "July 29, 2023"
output:
  beamer_presentation: default
  slidy_presentation: default
---

## Table of Contents

1.  Total Effect in Linear Systems
2.  Identifying Structural Coefficients and Causal Effect
3.  Mediation in Linear Systems
4.  Summary

------------------------------------------------------------------------

## Total Effect in Linear Systems

-   In a linear system, the total effect of $X$ on $Y$ is the sum of the products of the coefficients of the edges on every nonbackdoor path from $X$ to $Y$.
-   To find the total effect of $X$ on $Y$, follow these steps:
    -   Find every nonbackdoor path from $X$ to $Y$.
    -   For each path, multiply all coefficients on the path together.
    -   Add up all the products.
-   This identity is a consequence of the nature of Structural Causal Models (SCMs).

------------------------------------------------------------------------

## Total Effect in Linear Systems (continued)

::: columns
::: {.column width="50%"}
![](https://cdn.mathpix.com/cropped/2023_07_28_a103dc94c1ec860738c6g-31.jpg?height=443&width=345&top_left_y=1857&top_left_x=867) Figure 3.13: Graphical representation of a linear system
:::

::: {.column width="50%"}
-   Consider the graph in Figure 3.13.
-   To find the total effect of $Z$ on $Y$, intervene on $Z$, removing all arrows going into $Z$, then express $Y$ in terms of $Z$ in the remaining model.
-   This can be done with a little algebra, resulting in the equation $Y=\tau Z+U$, where $\tau=d+e c$ and $U$ contains only terms that do not depend on $Z$ in the modified model.
-   An increase of a single unit in $Z$ will increase $Y$ by $\tau$-the definition of the total effect.
:::
:::

------------------------------------------------------------------------

## Identifying Structural Coefficients and Causal Effect

-   The problem of estimating total and direct effects from nonexperimental data is known as "identifiability".
-   It involves expressing the path coefficients associated with the total and direct effects in terms of the covariances $\sigma_{X Y}$ or regression coefficients $R_{Y X \cdot Z}$.
-   In many cases, to identify direct and total effects, we do not need to identify each and every structural parameter in the model.

------------------------------------------------------------------------

## Identifying Structural Coefficients and Causal Effect (continued)

-   To determine the causal effect of $X$ on $Y$, we can use the backdoor criterion to find a set of variables $Z$ to adjust for.
-   Once we obtain the set, $Z$, we can estimate the conditional expectation of $Y$ given $X$ and $Z$.
-   Averaging over $Z$, we can use the resultant dependence between $Y$ and $X$ to measure the effect of $X$ on $Y$.
-   This procedure can be translated to the language of regression.

------------------------------------------------------------------------

## Identifying Structural Coefficients and Causal Effect (continued)

::: columns
::: {.column width="50%"}
![](https://cdn.mathpix.com/cropped/2023_07_28_a103dc94c1ec860738c6g-32.jpg?height=471&width=343&top_left_y=958&top_left_x=845) Figure 3.15: A graphical model in which $X$ has direct effect $\alpha$ on $Y$
:::

::: {.column width="50%"}
-   To find the direct effect of $X$ on $Y$, we can use a similar procedure to the backdoor criterion, but now we need to block not only backdoor paths but also indirect paths going from $X$ to $Y$.
-   First, we remove the edge from $X$ to $Y$ (if such an edge exists), and call the resulting graph $G_{\alpha}$.
-   If, in $G_{\alpha}$, there is a set of variables $Z$ that $d$-separates $X$ and $Y$, then we can simply regress $Y$ on $X$ and $Z$.
-   The coefficient of $X$ in the resulting equation will equal the structural coefficient $\alpha$.
:::
:::

------------------------------------------------------------------------

## Identifying Structural Coefficients and Causal Effect (continued)

::: columns
::: {.column width="50%"}
![](https://cdn.mathpix.com/cropped/2023_07_28_a103dc94c1ec860738c6g-33.jpg?height=477&width=345&top_left_y=490&top_left_x=867) Figure 3.16: By removing the direct edge from $X$ to $Y$ and finding the set of variables $\{W\}$ that $d$-separate them, we find the variables we need to adjust for to determine the direct effect of $X$ on $Y$
:::

::: {.column width="50%"}
-   In the linear model of Figure 3.15, we can find the direct effect of $X$ on $Y$ by this method.
-   First, we remove the edge between $X$ and $Y$ and get the graph $G_{\alpha}$ shown in Figure 3.16.
-   In this new graph, $W d$-separates $X$ and $Y$.
-   So we regress $Y$ on $X$ and $W$, using the regression equation $Y=r_{X} X+r_{W} W+\epsilon$.
-   The coefficient $r_{X}$ is the direct effect of $X$ on $Y$.
:::
:::

------------------------------------------------------------------------

## Mediation in Linear Systems

-   When we can assume linear relationships between variables, mediation analysis becomes much simpler than the analysis conducted in nonlinear or nonparametric systems.
-   Estimating the direct effect of $X$ on $Y$ amounts to estimating the path coefficient between the two variables, and this reduces to estimating correlation coefficients.
-   The indirect effect is computed via the difference $I E=\tau-D E$, where $\tau$, the total effect, can be estimated by regression.

------------------------------------------------------------------------

## Summary

-   In linear systems, the total effect of $X$ on $Y$ is the sum of the products of the coefficients of the edges on every nonbackdoor path from $X$ to $Y$.
-   The backdoor criterion can be used to identify the set of variables to adjust for in order to determine the causal effect of $X$ on $Y$.
-   The direct effect of $X$ on $Y$ can be found by removing the edge from $X$ to $Y$ and finding a set of variables that $d$-separates $X$ and $Y$.
-   In linear systems, mediation analysis is simplified, with the direct effect of $X$ on $Y$ estimated by the path coefficient between the two variables, and the indirect effect computed via the difference between the total effect and the direct effect.

By filing an issue to this repo, I promise that

I understand that my issue may be closed if I don't fulfill my promises.

cderv commented 11 months ago

I don't think this is the special character. Using URL in \include_graphics is not supported in PDF. So the image needs to be downloaded first.

Easier reprex:

---
title: "Image URL"
output:
  beamer_presentation: 
    keep_tex: true
  pdf_document: 
    keep_tex: true
---

# test

![](https://raw.githubusercontent.com/rstudio/hex-stickers/main/PNG/knitr.png)
! Package pdftex.def Error: File `https://raw.githubusercontent.com/rstudio/hex-stickers/main/PNG/knitr.png' not found: using draft setting.

Error:
! LaTeX failed to compile test2.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See test2.log for more info.
Backtrace:
    x
 1. \-rmarkdown::render("C:/Users/chris/Documents/test2.Rmd", encoding = "UTF-8")
 2.   \-rmarkdown:::latexmk(...)
 3.     \-tinytex::latexmk(file, engine, if (biblatex) "biber" else "bibtex")
 4.       \-tinytex:::latexmk_emu(...)
 5.         \-tinytex (local) run_engine()
 6.           +-tinytex:::system2_quiet(...)
 7.           \-tinytex (local) on_error()
 8.             \-tinytex:::show_latex_error(file, logfile)
Exécution arrêtée

This is more a rmarkdown limitation I would say.

Fonction knitr::knit_embed_url() can help with this for now. Otherwise you need to download locally before inserting.

@yihui How is that suppose to work in R Markdown using url for image and PDF output ? I am surprised we don't download image locally to embed. 🤔 I really thought is would work.

Shouldn't a fonction like knit::include_graphics() handle downloading image when inserted in a known format like PDF where this won't work ?

I think we should try to do something (download image?), somewhere (knitr or rmarkdown), somehow (Lua filter in rmarkdown to catch all images ? Or extenting knitr::include_graphics() to support url...

yihui commented 10 months ago

Right. Pandoc will download images when compiling to PDF, but R Markdown doesn't. This is due to the fact that R Markdown only uses Pandoc to render the intermediate .tex file, in which case Pandoc wouldn't download images (it downloads images only when rendering to .pdf directly).

We can certainly try to download images inside knitr::include_graphics(). The only tricky thing is the download file paths, and how to clean them up after PDF is generated. I think Pandoc just download images to temporary paths, and delete them afterwards. We could do the same thing, but knitr::include_graphics() needs to know the temp paths, e.g., instructed by rmarkdown::render() in some way.

Personally I don't prefer images being downloaded again every time the PDF is generated. That can be slow and wastes the bandwidth. I'd cache the download like this: https://stackoverflow.com/a/46333724/559676 It's easy to make a function out of this solution if desired.

web_image = function(url, path = xfun::url_filename(url)) {
  if (!file.exists(path)) xfun::download_file(url, path)
  knitr::include_graphics(if (knitr::pandoc_to('html')) url else path)  
}

Then you call

```{r}
web_image('https://cdn.mathpix.com/cropped/2023_07_28_a103dc94c1ec860738c6g-31.jpg?height=443&width=345&top_left_y=1857&top_left_x=867')
cderv commented 10 months ago

Yes I would do something like that do. A new function seems good for this ! I would just not download the image for html

yihui commented 10 months ago

@bayeslearner Are you okay with using the web_image() function above? Do I need to add this function to knitr?

bayeslearner commented 10 months ago

Seems to be what we need.

yihui commented 10 months ago

I've added it as a new function download_image() in the dev version of knitr. Thanks!

github-actions[bot] commented 4 months ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.