rstudio / pagedown

Paginate the HTML Output of R Markdown with CSS for Print
https://pagedown.rbind.io
Other
892 stars 129 forks source link

chrome_print() fails with Chrome 79 and Google fonts on Windows 7/8 #157

Open itsmevidhyak opened 4 years ago

itsmevidhyak commented 4 years ago

The traceback is as follows:

  1. pagedown::chrome_print("mycv.html", format = "pdf")
  2. with_temp_loop_maybe({ ws = websocket::WebSocket$new(get_entrypoint(debug_port), autoConnect = FALSE) ws$onClose(kill_chrome) ...
  3. with_loop(loop, expr)
  4. force(expr)
  5. stop("Failed to generate output in ", timeout, " seconds (timeout).")

The command was running fine on 3rd Dec 2019. I tried reinstalling pagedown from the Github repo today and restarted the Rstudio and my machine as well.

When I tried opening the mycv.html file, I was receiving a blank document in chrome. Used

python -m http:server xxxx to create a local server via Anaconda and was able to save the file as pdf using ctrl + p. Then I tried in R - servr::httd() and ran the chrome_print() command again. However, the run resulted in the error.

I tried setting the PAGEDOWN_CHROME environment variable to chrome executable path and added knit: pagedown::chrome_print in the YAML header. Still, the error is not resolved. The error message is as follows -

Serving the directory C:\Users...\Documents at http://.../mycv.html Error in force(expr) : Failed to generate output in 30 seconds (timeout). Calls: -> with_temp_loop_maybe -> with_loop -> force Closing websocket connection Execution halted

Any help/resolution is much appreciated. Thank you.

RLesur commented 4 years ago

I need more information to understand the problem. Please, can you describe your environment (system, desktop, server, proxy...)?
Could you also run pagedown::chrome_print("mycv.html", format = "pdf", verbose = 2) without using servr::httd()? The result of Sys.getenv() could also help me.

itsmevidhyak commented 4 years ago

Sure, I am running on Windows 8.1 on my laptop.

Sys.getenv() has resulted in the below output -

ALLUSERSPROFILE   C:\ProgramData
APPDATA           C:\Users\Vidhya\AppData\Roaming
CLICOLOR_FORCE    1
CommonProgramFiles
                  C:\Program Files\Common Files
CommonProgramFiles(x86)
                  C:\Program Files (x86)\Common Files
CommonProgramW6432
                  C:\Program Files\Common Files
COMPUTERNAME      VIDHYAK
ComSpec           C:\WINDOWS\system32\cmd.exe
DISPLAY           :0
FP_NO_HOST_CHECK
                  NO
GFORTRAN_STDERR_UNIT
                  -1
GFORTRAN_STDOUT_UNIT
                  -1
HOME              C:/Users/Vidhya/Documents
HOMEDRIVE         C:
HOMEPATH          \Users\Vidhya
LOCALAPPDATA      C:\Users\Vidhya\AppData\Local
LOGONSERVER       \\VIDHYAK
MSYS2_ENV_CONV_EXCL
                  R_ARCH
NUMBER_OF_PROCESSORS
                  4
OnlineServices    Online Services
OS                Windows_NT
PATH              C:\Program
                  Files\R\R-3.6.1\bin\x64;c:\Rtools\bin;c:\Rtools\mingw_64\bin;C:\Program
                  Files\ImageMagick-7.0.5-Q16;c:\Rtools\bin;c:\Rtools\mingw_64\bin;c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;C:\Program
                  Files (x86)\Intel\iCLS
                  Client\;C:\Program Files\Intel\iCLS
                  Client\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program
                  Files\Intel\Intel(R) Management
                  Engine Components\DAL;C:\Program
                  Files (x86)\Intel\Intel(R)
                  Management Engine
                  Components\DAL;C:\Program
                  Files\Intel\Intel(R) Management
                  Engine Components\IPT;C:\Program
                  Files (x86)\Intel\Intel(R)
                  Management Engine
                  Components\IPT;C:\Program
                  Files\MiKTeX
                  2.9\miktex\bin\x64\;C:\Anaconda;C:\Anaconda\Library\mingw-w64\bin;C:\Anaconda\Library\usr\bin;C:\Anaconda\Library\bin;C:\Anaconda\Scripts;C:\Anaconda\bin;c:\Rtools\bin;c:\Rtools\mingw_64\bin;C:\Program
                  Files\ImageMagick-7.0.5-Q16;c:\Rtools\bin;c:\Rtools\mingw_64\bin;c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;C:\Program
                  Files (x86)\Intel\iCLS
                  Client\;C:\Program Files\Intel\iCLS
                  Client\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program
                  Files\Intel\Intel(R) Management
                  Engine Components\DAL;C:\Program
                  Files (x86)\Intel\Intel(R)
                  Management Engine
                  Components\DAL;C:\Program
                  Files\Intel\Intel(R) Management
                  Engine Components\IPT;C:\Program
                  Files (x86)\Intel\Intel(R)
                  Management Engine
                  Components\IPT;C:\Users\Vidhya\AppData\Local\rodeo\app-2.5.2\bin;C:\Users\Vidhya\AppData\Local\atom\bin;C:\Anaconda;C:\Anaconda\Scripts
PATHEXT           .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
Platform          MCD
platformcode      KV
PROCESSOR_ARCHITECTURE
                  AMD64
PROCESSOR_IDENTIFIER
                  Intel64 Family 6 Model 61 Stepping
                  4, GenuineIntel
PROCESSOR_LEVEL   6
PROCESSOR_REVISION
                  3d04
ProgramData       C:\ProgramData
ProgramFiles      C:\Program Files
ProgramFiles(x86)
                  C:\Program Files (x86)
ProgramW6432      C:\Program Files
PSModulePath      C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules\
PUBLIC            C:\Users\Public
QT_D3DCREATE_MULTITHREADED
                  1
R_ARCH            /x64
R_COMPILED_BY     gcc 4.9.3
R_DOC_DIR         C:/PROGRA~1/R/R-36~1.1/doc
R_HOME            C:/PROGRA~1/R/R-36~1.1
R_LIBS_USER       C:/Users/Vidhya/Documents/R/win-library/3.6
R_USER            C:/Users/Vidhya/Documents
RMARKDOWN_MATHJAX_PATH
                  C:/Program
                  Files/RStudio/resources/mathjax-26
RS_LOCAL_PEER     \\.\pipe\8829-rsession
RS_RPOSTBACK_PATH
                  C:/Program
                  Files/RStudio/bin/rpostback
RS_SHARED_SECRET
                  63341846741
RSTUDIO           1
RSTUDIO_CONSOLE_COLOR
                  256
RSTUDIO_CONSOLE_WIDTH
                  80
RSTUDIO_MSYS_SSH
                  C:/Program
                  Files/RStudio/bin/msys-ssh-1000-18
RSTUDIO_PANDOC    C:/Program Files/RStudio/bin/pandoc
RSTUDIO_SESSION_PORT
                  8829
RSTUDIO_USER_IDENTITY
                  Vidhya
RSTUDIO_WINUTILS
                  C:/Program
                  Files/RStudio/bin/winutils
SESSIONNAME       Console
SystemDrive       C:
SystemRoot        C:\WINDOWS
TEMP              C:\Users\Vidhya\AppData\Local\Temp
TERM              xterm-256color
TMP               C:\Users\Vidhya\AppData\Local\Temp
USERDOMAIN        VidhyaK
USERDOMAIN_ROAMINGPROFILE
                  VidhyaK
USERNAME          Vidhya
USERPROFILE       C:\Users\Vidhya
windir            C:\WINDOWS

Running sessionInfo() returned the following -

sessionInfo()


R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)

Matrix products: default

locale: [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252 [4] LC_NUMERIC=C LC_TIME=English_India.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_3.6.1 htmltools_0.4.0 tools_3.6.1 yaml_2.2.0 Rcpp_1.0.3 rmarkdown_2.0
[7] knitr_1.26 pagedown_0.6.3 xfun_0.11 digest_0.6.23 rlang_0.4.2 evaluate_0.14


I tried running the `pagedown::chrome_print("mycv.html", format = "pdf", verbose = 2) ` without `servr::httd()`. It returns below -

![image](https://user-images.githubusercontent.com/45891298/70874037-62d7e580-1fd6-11ea-9b67-7d6faa471fa4.png)
RLesur commented 4 years ago

Thanks for these informations. I still don't understand what is the problem.
I need more information.

Please, could you run xfun::session_info("pagedown")?

I also need the results of the following tests.
First, please install the latest development version of pagedown with remotes::install_github("rstudio/pagedown"). Then, run the following script:

# uncomment to install the development version
# remotes::install_github("rstudio/pagedown")

local({
  # Is the latest development version used?
  stopifnot(packageVersion("pagedown") >= "0.6.4")

  chrome_print_log <- function(input, log_file) {
    out <- file(log_file, open = "wt")
    sink(out, type = "message")
    on.exit({
      sink(NULL, type = "message")
      close(out)
    })
    tryCatch(
      pagedown::chrome_print(input = input, output = tempfile(), verbose = 2),
      error = function(e) message("Error: ", e$message)
    )
  }

  # test 1
  chrome_print_log("http://httpbin.org/html", "chrome_print_test_1.log")

  # test 2
  chrome_print_log("https://pagedown.rbind.io/html-resume", "chrome_print_test_2.log")

  # test 3
  download.file("http://httpbin.org/html", input <- tempfile(fileext = ".html"))
  chrome_print_log(input, "chrome_print_test_3.log")
})

You should get 3 files: chrome_print_test_1.log, chrome_print_test_2.log and chrome_print_test_3.log.
Can you attach these files in this thread? I need to inspect them.

itsmevidhyak commented 4 years ago

Sure, no problem.

> xfun::session_info("pagedown")
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600), RStudio 1.2.1335

Locale:
  LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
  LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
  LC_TIME=English_India.1252    

Package version:
  AsioHeaders_1.12.1.1 base64enc_0.1.3      BH_1.69.0.1         
  bookdown_0.16        digest_0.6.23        evaluate_0.14       
  glue_1.3.1           graphics_3.6.1       grDevices_3.6.1     
  highr_0.8            htmltools_0.4.0      httpuv_1.5.2        
  jsonlite_1.6         knitr_1.26           later_1.0.0         
  magrittr_1.5         markdown_1.1         methods_3.6.1       
  mime_0.7             pagedown_0.6.3       processx_3.4.1      
  promises_1.1.0       ps_1.3.0             R6_2.4.1            
  Rcpp_1.0.3           rlang_0.4.2          rmarkdown_2.0       
  servr_0.15           stats_3.6.1          stringi_1.4.3       
  stringr_1.4.0        tinytex_0.18         tools_3.6.1         
  utils_3.6.1          websocket_1.1.0      xfun_0.11           
  yaml_2.2.0  

Re-installed pagedown as suggested and ran the script to generate the logs. Attached the logs herewith.

> remotes::install_github("rstudio/pagedown")
Downloading GitHub repo rstudio/pagedown@master
These packages have more recent versions available.
Which would you like to update?

1: All                             
2: CRAN packages only              
3: None                            
4: BH (1.69.0-1 -> 1.72.0-1) [CRAN]

Enter one or more numbers, or an empty line to skip updates:
1
BH (1.69.0-1 -> 1.72.0-1) [CRAN]
Installing 1 packages: BH
Installing package into ‘C:/Users/Vidhya/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/BH_1.72.0-1.zip'
Content type 'application/zip' length 18170647 bytes (17.3 MB)
downloaded 17.3 MB

package ‘BH’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Vidhya\AppData\Local\Temp\RtmpiQOq1q\downloaded_packages
√  checking for file 'C:\Users\Vidhya\AppData\Local\Temp\RtmpiQOq1q\remotes20a4465b7120\rstudio-pagedown-17bf797/DESCRIPTION'
-  preparing 'pagedown': (1.1s)
√  checking DESCRIPTION meta-information ... 
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  building 'pagedown_0.6.4.tar.gz'

Installing package into ‘C:/Users/Vidhya/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package 'pagedown' ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package 'pagedown'
    finding HTML links ... done
    book_crc                                html  
    business_card                           html  
    chrome_print                            html  
    finding level-2 HTML links ... done

    find_chrome                             html  
    html_letter                             html  
    html_paged                              html  
    html_resume                             html  
    jss_paged                               html  
    poster_relaxed                          html  
    thesis_paged                            html  
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (pagedown)
Warning messages:
1: In untar2(tarfile, files, list, exdir) :
  skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir) :
  skipping pax global extended headers
> local({
+   # Is the latest development version used?
+   stopifnot(packageVersion("pagedown") >= "0.6.4")
+   
+   chrome_print_log <- function(input, log_file) {
+     out <- file(log_file, open = "wt")
+     sink(out, type = "message")
+     on.exit({
+       sink(NULL, type = "message")
+       close(out)
+     })
+     tryCatch(
+       pagedown::chrome_print(input = input, output = tempfile(), verbose = 2),
+       error = function(e) message("Error: ", e$message)
+     )
+   }
+   
+   # test 1
+   chrome_print_log("http://httpbin.org/html", "chrome_print_test_1.log")
+   
+   # test 2
+   chrome_print_log("https://pagedown.rbind.io/html-resume", "chrome_print_test_2.log")
+   
+   # test 3
+   download.file("http://httpbin.org/html", input <- tempfile(fileext = ".html"))
+   chrome_print_log(input, "chrome_print_test_3.log")
+ })
trying URL 'http://httpbin.org/html'
Content type 'text/html; charset=utf-8' length 3741 bytes
downloaded 3741 bytes

Re-ran the first line again to retrieve session info.

> xfun::session_info("pagedown")
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600), RStudio 1.2.1335

Locale:
  LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
  LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
  LC_TIME=English_India.1252    

Package version:
  AsioHeaders_1.12.1.1 base64enc_0.1.3      BH_1.72.0.1         
  bookdown_0.16        digest_0.6.23        evaluate_0.14       
  glue_1.3.1           graphics_3.6.1       grDevices_3.6.1     
  highr_0.8            htmltools_0.4.0      httpuv_1.5.2        
  jsonlite_1.6         knitr_1.26           later_1.0.0         
  magrittr_1.5         markdown_1.1         methods_3.6.1       
  mime_0.7             pagedown_0.6.4       processx_3.4.1      
  promises_1.1.0       ps_1.3.0             R6_2.4.1            
  Rcpp_1.0.3           rlang_0.4.2          rmarkdown_2.0       
  servr_0.15           stats_3.6.1          stringi_1.4.3       
  stringr_1.4.0        tinytex_0.18         tools_3.6.1         
  utils_3.6.1          websocket_1.1.0      xfun_0.11           
  yaml_2.2.0        

Feel free to let me know if anything else is needed to resolve this issue. Many thanks.

chrome_print_test_1.log chrome_print_test_2.log chrome_print_test_3.log

RLesur commented 4 years ago

These tests passed. Your configuration seems to be OK:

I wonder whether the problem is specific to your HTML file (the one named mycv.html). Is this HTML file available online?
If not, please run the following test in the folder which contains this file: you will get a file named chrome_print_test_4.log. I need to inspect this file, too.
Be warned that this log file may contain the PDF version of your CV. If you are not comfortable by attaching it here for privacy reasons, you can send it to me by email.

local({
  # Is the latest development version used?
  stopifnot(packageVersion("pagedown") >= "0.6.4")
  # Does the current working directory contains mycv.html file?
  stopifnot(file.exists("mycv.html"))

  chrome_print_log <- function(input, log_file) {
    out <- file(log_file, open = "wt")
    sink(out, type = "message")
    on.exit({
      sink(NULL, type = "message")
      close(out)
    })
    tryCatch(
      pagedown::chrome_print(input = input, output = tempfile(), verbose = 2),
      error = function(e) message("Error: ", e$message)
    )
  }

  # test 4
  chrome_print_log("mycv.html", "chrome_print_test_4.log")
})
itsmevidhyak commented 4 years ago

Thanks for the effort, the log seems to contain the same error. I am wondering because I have not changed anything in the source file from 3rd December version except, I tried changing the font once in CSS file from "georgia" to "Roboto". Saved, clicked knit and started getting this error. Reverted back to the "georgia" font again only to see my error persist.

chrome_print_test_4.log

If I upload the .rmd code, html file, relevent dependencies and the index files in a private test repo and will give access to you will that be of any help?

RLesur commented 4 years ago

Thanks a lot. From the last log, chrome_print() seems to work well but the process is slow (why? I don't know... maybe a slow internet connexion or an antivirus?).
Could you test by increasing the value of the timeout parameter in chrome_print()? Try once with a very large value.

RLesur commented 4 years ago

@itsmevidhyak wrote:

If I upload the .rmd code, html file, relevent dependencies and the index files in a private test repo and will give access to you will that be of any help?

If it is possible, yes.

itsmevidhyak commented 4 years ago

Thanks, my internet connection is having upload speed of 95-100Mbps and download speed of 87-97 Mbps. Microsoft's default firewall alone is present in my machine. The high timeout parameter returned no pdf. I will upload the code to private repo before tomorrow.

Or do you think, I should uninstall the entire R (I am anyway planning to update to recent version soon) and reinstall all the packages in R again and give it a try one more time to see if the old version of R is creating a problem? I will try this once and if not working on a clean install, will upload.

Thank you so much for the efforts and your valuable time. It means so much and this is why being a part of R community means so much to me.

RLesur commented 4 years ago

Or do you think, I should uninstall the entire R (I am anyway planning to update to recent version soon) and reinstall all the packages in R again and give it a try one more time to see if the old version of R is creating a problem? I will try this once and if not working on a clean install, will upload.

No, I don't think that reinstalling R could help. Since increasing the timeout does not work, my current guess is about external assets loading (fonts or images, for instance).

The internal problem is the following: chrome_print() waits for a signal given by the Paged.js library. Here, this signal is not emitted (that explains the timeout).
Since Paged.js awaits for external assets loading, it may be possible that a failure in an asset loading breaks the PDF rendering process. That's why I think that the problem may come from your HTML (or CSS) file.

RLesur commented 4 years ago

I spent a lot of hours on this issue: now, I can reproduce it on Windows (I haven't tested yet on Linux) but I cannot reproduce it on Bionic.

First of all, the issue is related to Chrome 79 (got the same problem with Chrome beta 80, not tested with Chrome dev 81). There is no problem with Chrome 78.

Something (I don't know what) has changed in Chrome with fonts loading: in short, headless Chrome does not download fonts anymore. There is no problem with non headless Chrome or local fonts. This is a severe problem.

@itsmevidhyak I see two workarounds:

In more details, chrome_print() fails with Chrome 79 because the following promise is pending infinitely for a CDN served font: https://github.com/rstudio/pagedown/blob/2364a72262d6254fea84a5856e9e58c5b0e4b9fb/inst/resources/js/paged.js#L2335

Since I have used the same strategy in chrome_print() for non Paged.js documents, chrome_print() also fails here for CDN served fonts: https://github.com/rstudio/pagedown/blob/2364a72262d6254fea84a5856e9e58c5b0e4b9fb/inst/resources/js/chrome_print.js#L66 This promise is never resolved.

By now, I have no idea how to solve this bug.

FWIW, here is the minimal HTML file I've built to understand the problem:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>title</title>
    <link href="https://fonts.googleapis.com/css?family=Roboto" rel="stylesheet">
    <style>
      body {
        font-family: 'Roboto', 'Wingdings', serif;
      }
    </style>
    <script>
      let fontsReady = new Promise((resolve, reject) => {
        window.addEventListener(
          'DOMContentLoaded',
          () => {
            let fontsLoaded = [];
            (document.fonts || []).forEach((font) => {
              console.log("Found a font, status: " + font.status);
              fontsLoaded.push(font.load())
            });
            Promise.all(fontsLoaded).then(
              (value) => {
                console.log("All fonts are loaded.");
                resolve(value);
              }, 
              reject
            );
          },
          {capture: true, once: true}
        );
      });
      fontsReady.catch((r) => {
        console.log("Error: " + r);
      });
    </script>
  </head>
  <body>
    <h1>Test</h1>
  </body>
</html>
RLesur commented 4 years ago

After diving more in this issue, I am now convinced this is a bug in headless Chrome 79 on Windows when having a CSS font-face declaration like this one (this is an extract of the Google Fonts stylesheet for the Roboto font):

@font-face {
  font-family: 'Roboto';
  font-style: normal;
  font-weight: 400;
  src: local('Roboto'), local('Roboto-Regular'), url(https://fonts.gstatic.com/s/roboto/v20/KFOmCnqEu92Fr1Mu72xKOzY.woff2) format('woff2');
}

When a font is not installed locally, the FontFace.load() promise is pending indefinitely.
If one removes the local(...) in the src property, this works well.

A workaround would be to modify all the font-face rules for the stylesheets referenced in document.styleSheets. As explained here, this is not straightforward (but feasible).

RLesur commented 4 years ago

For the record, I've opened an issue in the Chromium project https://bugs.chromium.org/p/chromium/issues/detail?id=1040984.

RLesur commented 4 years ago

A follow up on this issue: I've omitted an important point, I was able to reproduce the issue on Windows 7. I've tried on Windows 10 and Chrome >= 79 and I cannot reproduce it.

Because it seems very specific (Windows 7/8 & Chrome 79) and there are workarounds (see https://github.com/rstudio/pagedown/issues/157#issuecomment-570634712), I decide to stop working on this issue. We will see if the Chromium team will fix it (not sure).

itsmevidhyak commented 4 years ago

Sure, thank you. Given the low priority of criticality and the fact that I did find the workaround, the decision makes sense.

shrektan commented 4 years ago

Bite by a similar issue.

Sorry, I didn't read all the comments carefully. My point is:

Is it possible to ask Chrome to stop downloading the resource anymore after some time (like 10 seconds) and render whatever it gets?

The current pagedown template (in RStudio IDE) contains many external sources (gifs, pictures, fonts), and some of which are not accessible in China. So basically what we get is the timeout error, which is quite frustrating at first...