tatut / clj-chrome-devtools

Clojure API for controlling a Chrome DevTools remote
MIT License
130 stars 21 forks source link

Getting blank PDFs when saving `page/print-to-pdf` data as a string #29

Closed rgm closed 3 years ago

rgm commented 3 years ago

I've gotten clj-chrome-devtools.commands.page/print-to-pdf tantalizingly close to working, but clearly I'm doing something not quite right. Happy to help with steps-to-reproduce or a PR, but I'll need some help shaping these.

Steps to reproduce (maybe):

(require
 '[clj-chrome-devtools.core          :as chrome]
 '[clj-chrome-devtools.commands.page :as page])

(defn b64-decode
  "https://stackoverflow.com/a/39188819/53790"
  [to-decode]
  (String. (.decode (Base64/getDecoder) to-decode)))

(def c (chrome/connect "localhost" 9222))
(page/navigate c {:url "https://github.com"})
(let [response (page/print-to-pdf c {:page-ranges "1-2"})
      pdf-data (b64-decode (:data response))]
  (spit "test.pdf" pdf-data))

Expected:

Should get a 2-page letter-size (ie. the default) PDF of the Github site as test.pdf.

Actual:

There is a 2-page letter-size PDF created at test.pdf, but its pages are blank.

Notes:

const puppeteer = require("puppeteer-core");

(async () => {
  const browser = await puppeteer.connect({
    // from "webSocketDebuggerUrl" at http://127.0.0.1:9222/json/version
    browserWSEndpoint:
      "ws://127.0.0.1:9222/devtools/browser/d6687d16-aabb-4eb5-9f39-38275307c218",
  });
  const page = await browser.newPage();
  await page.goto("https://github.com", {
    waitUntil: "networkidle2",
  });
  await page.pdf({ path: "test.pdf" });
})();
tatut commented 3 years ago

Haven't investigated yet, but did you try the higher level to function in the automation namespace, which actually waits for the page to be loaded before returning?

rgm commented 3 years ago

Oh, interesting thought. Gave this a try too and no, same result: 2 blank pages. (I was running the original code form-by-form via nrepl. It was a good chance to try out the automation API though; now I know it for browser testing at least).

Slightly more data: the dom namespace commands work and are returning what looks like the right DOM data. When I connect a Chrome browser to https://localhost:9222 and look, I'm definitely seeing loaded and rendered DOM.

(let [c        (chrome/connect "localhost" 9222)
      a        (auto/create-automation c)
      _        (auto/to a "https://github.com")
      resp     (page/print-to-pdf c {:page-ranges "1-2"})
      pdf-data (b64-decode (:data resp))]
  (spit "test.pdf" pdf-data))
tatut commented 3 years ago

Tried to locally print PDF on OS X and it works fine.

rgm commented 3 years ago

Hm, guess I'll have to see if building a fresh new Linux VPS makes any difference.

rgm commented 3 years ago

I'm at a bit of a loss. I get the same blank page when I use a reasonably reliable Docker image:

docker container run -d -p 9222:9222 zenika/alpine-chrome --no-sandbox --remote-debugging-address=0.0.0.0 --remote-debugging-port=9222

Same behaviour: my Clojure code is producing a blank PDF with the right number of pages, and the Puppeteer/node script above is producing a PDF with actual content.

rgm commented 3 years ago

@tatut Do you have a short working Clojure example (eg. the one you used to test on OS X)? Maybe I could try that.

My only remaining debugging idea is that I'm just accidentally decoding or re-encoding the data incorrectly somewhere between the b64 decode and writing to disk.

tatut commented 3 years ago

I noticed one issue when trying to print large PDF is that the default 1 megabyte WS msg size might be too low for some PDFs. You may want to try providing a custom ws client to connect.

rgm commented 3 years ago

I'm still getting blank pages with a 256mb ws message size limit:

(def ws-client (clj-chrome-devtools.impl.connection/make-ws-client :max-msg-size-mb (* 1024 1024 256)))
(def c (clj-chrome-devtools.core/connect "localhost" 9222 1000 ws-client))
,,, ;; as above

I guess I'll build out a fresh VPS and try to make a minimum test repo.

rgm commented 3 years ago

Drat.

I built out a fresh Ubuntu VPS, installed only Java, Clojure and Chrome. I confirmed that google-chrome --disable-gpu --headless --print-to-pdf=output.pdf https://chromestatus.com gives me rendered output. Started up a chrome process with google-chrome --disable-gpu --headless --remote-debugging-port=9222.

I ran the example project just above this comment. And still, this minimum example is giving me a set of the correct number of completely blank pages.

@tatut Am I decoding the image correctly to spit it to disk? (Please see https://github.com/rgm/experiments/pull/17/files#diff-0290e8cf70657d2f8af40fef5e8dfb4ddd7de7bb6b1d8953fc94608586fb9003R11-R23). I can't think of anything else to try; I think that's the last of my app code that I'm not certain is correct. But then, I'm not getting a PDF error either when I open the output file in Mac Preview.app, so it's not malformed.

tatut commented 3 years ago

Might be, please try

(require '[clojure.java.io :as io])
(def pdf-data (b64-decode (:data resp)))
(with-open [out (io/output-stream "test.pdf")]
  (.write out pdf-data))

instead of creating a string and spitting it. The above at works for me

rgm commented 3 years ago

🎉

Yep, that was it 🤦. Treating it as binary data (which is what I think is the gist of your suggestion) has me able to successfully render a file. See https://github.com/rgm/experiments/pull/17/commits/ad20991478080616f21cf1f81e995303ca1fb758

Thanks for all your help.

Is there a cookbook or somewhere in the docs that I can write this up for you as a PR?

tatut commented 3 years ago

No cookbook as such, but that's a good idea. Alternatively that print to PDF functionality could be added as a fuction to the higher level automation ns.