kohheepeace / mr-pdf

Generate PDF for document website.
https://www.npmjs.com/package/mr-pdf
MIT License
125 stars 41 forks source link

Error trying to generate PDF with Docusaurus v2 #66

Closed Zenahr closed 1 year ago

Zenahr commented 1 year ago

Context

I'm trying to export my docs from Docusaurus v2.2.0 by first running npm run build, followed by npm run serve.

My PDF generation command is:

npx mr-pdf --initialDocURLs="http://localhost:3000/docs/introduction/"
--paginationSelector=".docs-prevnext > a.docs-next"
--excludeSelectors=".fixedHeaderContainer,footer.nav-footer,#docsNav,nav.onPageNav,a.edit-page-link,div.docs-prevnext" 
--cssStyle=".navPusher {padding-top: 0;}" 
--contentSelector="article"

The initialDocURL is correct. I tested that by running puppeteer with a custom script like so:

´´´js import puppeteer from 'puppeteer';

(async () => { const browser = await puppeteer.launch({ headless: false }); const page = await browser.newPage(); await page.goto('http://localhost:3000/docs/introduction/'); // wait 5 seconds await page.waitForTimeout(5000);

await browser.close(); })(); ´´´

The error I get when running the PDF generation command shown above:

Retrieving html from http://localhost:3000/docs/introduction/

Success
Error: Protocol error (Page.navigate): Cannot navigate to invalid URL
    at C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\Connection.js:183:56
    at new Promise (<anonymous>)
    at CDPSession.send (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\Connection.js:182:12)
    at navigate (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\FrameManager.js:118:39)
    at FrameManager.navigateFrame (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\FrameManager.js:95:7)    
    at Frame.goto (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\FrameManager.js:406:37)
    at Frame.<anonymous> (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\helper.js:112:23)
    at Page.goto (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\Page.js:672:49)
    at Page.<anonymous> (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\helper.js:112:23)
    at Object.generatePDF (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\mr-pdf\lib\utils.js:70:35)
  -- ASYNC --
    at Frame.<anonymous> (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\helper.js:111:15)
    at Page.goto (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\Page.js:672:49)
    at Page.<anonymous> (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\helper.js:112:23)
    at Object.generatePDF (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\mr-pdf\lib\utils.js:70:35)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

I'm guessing the error is due to the CSS classes change on the navigation items in Docusaurus?

Solution (update the pagination selector)

Edit: Yep, funny how rubber duck debugging works while writing an issue.

To solve this issue all I had to do was updating the pagination selector parameter to --paginationSelector="a.pagination-nav__link--next".

Zenahr commented 1 year ago

NOTE that my suggestion above does get the generator through all documentation pages but it crashes at the end. I don't know why though.

Zenahr commented 1 year ago

To reproduce this:

  1. run npx create-docusaurus@latest my-website classic (as of writing, the latest docosaurus verison is v.2.2.0)
  2. cd into the new folder
  3. npm install
  4. npm run build
  5. npm run serve

While the docs are being served, run this command in a separate terminal:

npx mr-pdf --initialDocURLs="http://localhost:3000/docs/intro" --paginationSelector="a.pagination-nav__link--next" --excludeSelectors=".fixedHeaderContainer,footer.nav-footer,#docsNav,nav.onPageNav,a.edit-page-link,div.docs-prevnext" --cssStyle=".navPusher {padding-top: 0;}" --contentSelector="article"

The stacktrace I get:

Retrieving html from http://localhost:3000/docs/intro

Success

Retrieving html from http://localhost:3000/docs/category/tutorial---basics

Success

Retrieving html from http://localhost:3000/docs/tutorial-basics/create-a-page

Success

Retrieving html from http://localhost:3000/docs/tutorial-basics/create-a-document

Success

Retrieving html from http://localhost:3000/docs/tutorial-basics/create-a-blog-post

Success

Retrieving html from http://localhost:3000/docs/tutorial-basics/markdown-features

Success

Retrieving html from http://localhost:3000/docs/tutorial-basics/deploy-your-site

Success

Retrieving html from http://localhost:3000/docs/tutorial-basics/congratulations

Success

Retrieving html from http://localhost:3000/docs/category/tutorial---extras

Success

Retrieving html from http://localhost:3000/docs/tutorial-extras/manage-docs-versions

Success

Retrieving html from http://localhost:3000/docs/tutorial-extras/translate-your-site

Success
Error: Protocol error (Page.navigate): Cannot navigate to invalid URL
    at C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\Connection.js:183:56
    at new Promise (<anonymous>)
    at CDPSession.send (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\Connection.js:182:12)
    at navigate (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\FrameManager.js:118:39)
    at FrameManager.navigateFrame (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\FrameManager.js:95:7)
    at Frame.goto (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\FrameManager.js:406:37)
    at Frame.<anonymous> (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\helper.js:112:23)
    at Page.goto (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\Page.js:672:49)
    at Page.<anonymous> (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\helper.js:112:23)
    at Object.generatePDF (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\mr-pdf\lib\utils.js:70:35)
  -- ASYNC --
    at Frame.<anonymous> (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\helper.js:111:15)
    at Page.goto (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\Page.js:672:49)
    at Page.<anonymous> (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\puppeteer\lib\helper.js:112:23)
    at Object.generatePDF (C:\Users\Zenahr\AppData\Local\npm-cache\_npx\f49524b68d136ed3\node_modules\mr-pdf\lib\utils.js:70:35)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Looks like navigating the documentation works fine until the final page has been reached. Depending on your logic this could be a simple fix. One potential remedy to this issue could IMO be to check if an element with the specified paginationSelector is present on the page, if not, assume the end of docs have been reached and proceed with the rest of the function.

Zenahr commented 1 year ago

The reason for the crash was that it failed to retrieve the cover image. I fixed it for now by just not specifying a cover image. Everything else works perfectly fine.