kohheepeace / docusaurus-pdf

Generate PDF for docusaurus
https://drive.google.com/file/d/19P3qSwLLUHYigrxH3QXIMXmRpTFi4pKB/view
113 stars 18 forks source link

Generate table of contents #23

Open snowch opened 4 years ago

snowch commented 4 years ago

Does this generate the table of contents? It appears not, but this would be really good to have.

CharlieDigital commented 4 years ago

I am also interested in this functionality, but reviewing the code and the documentation for puppeteer page.pdf(), it is quite limited.

https://pptr.dev/#?product=Puppeteer&version=v5.3.1&show=api-pagepdfoptions

One thing I noticed, for example, is that it does not recognize and convert the links and the documentation for page.pdf() does not seem to indicate that there is an option to do so and I do not see an ability to do this with pdf-lib as it is too low level. In other words, inter document links are not currently converted (quite frankly not sure if this is because of how Docusaurus builds the URLs or because of pdf-lib). I suppose that it may be possible to address this by cleaning the HTML hrefs before feeding it into page.pdf().

The DocFx project also has the ability to generate PDFs. That project uses wkhtmltopdf which seems to be a higher level abstraction that does provide the ability to convert the links to internal links and also includes OOB ability to generate a table of contents as well as an outline.

The documentation is here: https://wkhtmltopdf.org/usage/wkhtmltopdf.txt

Table Of Contents:
  A table of contents can be added to the document by adding a toc object to the
  command line. For example:

  wkhtmltopdf toc https://qt-project.org/doc/qt-4.8/qstring.html qstring.pdf

  The table of contents is generated based on the H tags in the input documents.
  First a XML document is generated, then it is converted to HTML using XSLT.

  The generated XML document can be viewed by dumping it to a file using the
  --dump-outline switch. For example:

  wkhtmltopdf --dump-outline toc.xml https://qt-project.org/doc/qt-4.8/qstring.html qstring.pdf

  The XSLT document can be specified using the --xsl-style-sheet switch. For
  example:

  wkhtmltopdf toc --xsl-style-sheet my.xsl https://qt-project.org/doc/qt-4.8/qstring.html qstring.pdf

  The --dump-default-toc-xsl switch can be used to dump the default XSLT style
  sheet to stdout. This is a good start for writing your own style sheet

  wkhtmltopdf --dump-default-toc-xsl
  The XML document is in the namespace "http://wkhtmltopdf.org/outline" it has a
  root node called "outline" which contains a number of "item" nodes. An item
  can contain any number of item. These are the outline subsections to the
  section the item represents. A item node has the following attributes:

 * "title" the name of the section.
 * "page" the page number the section occurs on.
 * "link" a URL that links to the section.
 * "backLink" the name of the anchor the section will link back to.

  The remaining TOC options only affect the default style sheet so they will not
  work when specifying a custom style sheet.

To me, it seems like adding the ability to generate a ToC and outline (left navigation) would be tantamount to rewriting this library quite substantially.

Does it make sense to do so? Or perhaps fork the project or start a new project?

kohheepeace commented 4 years ago

I found this repo which seems to support to generate TOC (Though I didn't check in deep)

https://github.com/simologos/papersaurus

Puppeteer does not yet support the generation of TOCs. See this feature request and this Chromium bug. Therefore this package generates a PDF, then parses it again to update the page numbers in the TOC. Therefore the pdfFooterParser...

This approach looks decent to me.

bojl commented 3 years ago

Working on adding toc support in this PR. Still a draft for now but feel free to leave comments. I did rework how the pdf is generated so thoughts would be appreciated.