useblocks / sphinx-simplepdf

A simple PDF builder for Sphinx documentations
https://sphinx-simplepdf.readthedocs.io
MIT License
36 stars 15 forks source link

Error due to missing anchor #40

Open kreuzberger opened 1 year ago

kreuzberger commented 1 year ago

During Build with Spinx-simplepdf i get an error on every project: ERROR: No anchor # for internal URI reference

This seems to come from the TOC anchor pointint to an empty internal reference. Expecting:

  <div aria-label="main navigation" class="sphinxsidebar" role="navigation">
   <div class="sphinxsidebarwrapper">
    <div>
     <h3>
      <a href="#">
       Table of Contents
danwos commented 1 year ago

Yes, not having a valid anchor seem to be a problem. I sometimes get dozens of them for some projects.

I'm pretty sure Sphinx-SimplePDF is not the source of this problem, as HTML gets rendered by Sphinx. So it's Sphinx or the used weasyprint-lib, which has problems which such anchors.

But maybe we can do some HTML-postprocessing and search for such invalid anchors, before weasyprint tries to create a PDF out if it.

kreuzberger commented 1 year ago

The Warnings and errors are coming from weasyprint. E.g. the tons of css warnings, and also the error. I can reproduce it by calling weasyprint from the commandline and the generated singlehtml input file.

Question would be how to handle them. Solutions could be

I think the first solution would be ok?

kreuzberger commented 1 year ago

Preprocessing the html to satisfy weasyprint is no option for me in this szenario

danwos commented 1 year ago

capture stderr and show it only if an error occurs (e.g. with check=True)

I'm not sure if this is really the best option. Right now I haven't found the time to do some analysis of all the problems weasyprint is claiming about. Maybe most of them can be fixed by updating our theme and removing some unsupported CSS styles. I would like to try this first ...

kreuzberger commented 1 year ago

Most of the warnings seems to come from sphinx-needs styles. These seems e.g. valid css (like e.g. text-shadow), but due to they are not supported by weasyprint they are claimed with "WARNING: Ignored". Quieting all output seems also not so helpfull / dangerous.

So what about filtering with regular expressions? I attached a possible patch. With this patch we could "apply" a filter list to reduce the output (just ignore the patch order (-/+) a complete PR would follow if required)

diff --color -ru a/builders/simplepdf.py b/builders/simplepdf.py
--- a/builders/simplepdf.py 2023-03-27 14:13:12.406283366 +0200
+++ b/builders/simplepdf.py 2023-03-21 14:08:02.612199132 +0100
@@ -1,5 +1,4 @@
 import os
-import re
 from typing import Any, Dict
 import subprocess
 import weasyprint
@@ -122,29 +121,19 @@

         timeout = self.config['simplepdf_weasyprint_timeout']

-
-        filter_list = self.config['simplepdf_weasyprint_filter']
-        filter_pattern = '(?:% s)' % '|'.join(filter_list) if 0 < len(filter_list) else None
-
         if self.config['simplepdf_use_weasyprint_api']:
-
+            
             doc = weasyprint.HTML(index_path)

             doc.write_pdf(
                 target=os.path.join(self.app.outdir, f'{file_name}'),
             )
-
+        
         else:
             retries = self.config['simplepdf_weasyprint_retries']
             for n in range(1 + retries):
                 try:
-                    wp_out = subprocess.check_output(args, timeout=timeout, text=True, stderr=subprocess.STDOUT)
-
-                    for line in wp_out.splitlines():
-                        if filter_pattern is not None and re.match(filter_pattern, line):
-                            pass
-                        else:
-                            print(line)
+                    subprocess.check_output(args, timeout=timeout, text=True)
                     break
                 except subprocess.TimeoutExpired:
                     logger.warning(f"TimeoutExpired in weasyprint, retrying")
@@ -177,7 +166,6 @@
     app.add_config_value("simplepdf_theme", "simplepdf_theme", "html", types=[str])
     app.add_config_value("simplepdf_theme_options", {}, "html", types=[dict])
     app.add_config_value("simplepdf_sidebars", {'**': ["localtoc.html"]}, "html", types=[dict])
-    app.add_config_value("simplepdf_weasyprint_filter", [], "html", types=[list])
     app.add_builder(SimplePdfBuilder)

     return {
kreuzberger commented 1 year ago

Almost all errors except the missing anchors error are coming from the imported datatables.min.css.

INFO: Step 2 - Fetching and parsing CSS - file:///home/kreuzberger/src/s5/sx_spdf/sx/build/release/src/module/demod/iqdemod/doc/rtr/simplepdf/_static/sphinx-needs/libs/html/datatables.min.css
WARNING: Error: Expected <ident> for declaration name, got literal. at 13:645.
WARNING: Error: Expected <ident> for declaration name, got literal. at 13:8256.
WARNING: Error: Expected <ident> for declaration name, got literal. at 13:8848.
WARNING: Error: Expected <ident> for declaration name, got literal. at 13:12508.
WARNING: Expected a media type, got screen/**/and/**/(max-width: 767px)
WARNING: Invalid media type " screen and (max-width: 767px)" the whole @media rule was ignored at 13:13732.
WARNING: Expected a media type, got screen/**/and/**/(max-width: 640px)
WARNING: Invalid media type " screen and (max-width: 640px)" the whole @media rule was ignored at 13:13937.
WARNING: Expected a media type, got screen/**/and/**/(max-width: 640px)
WARNING: Invalid media type " screen and (max-width: 640px)" the whole @media rule was ignored at 16:8502.
WARNING: Expected a media type, got screen/**/and/**/(max-width: 767px)
WARNING: Invalid media type " screen and (max-width: 767px)" the whole @media rule was ignored at 28:3845.

It seems weasyprint claimes about "*cursor:hand" (i assume the wildcard selector) and similar properties