squidfunk / mkdocs-material

Documentation that simply works
https://squidfunk.github.io/mkdocs-material/
MIT License
19.96k stars 3.46k forks source link

Blog readtime includes inline SVG text content #7367

Closed sisp closed 1 month ago

sisp commented 1 month ago

Context

I'm inlining SVG images to be able to use Material for MkDocs' CSS variables for more consistent image styles with the rest of the page.


Slightly off topic but just for completeness: Unlike in the below reproduction (as I want to keep it minimal), I actually wrap the inline SVG in

<div>
  <template shadowrootmode="open">
    <svg ...>
       ...
    </svg>
  </template>
</div>

to avoid leakage of the CSS definitions into the document and potentially conflict with other inline SVGs' CSS definitions.

Bug description

The blog readtime plugin extracts any text data from the generated HTML page content including, e.g., CSS definitions in inline SVG images. This leads to a bad estimate of the read time, especially when, e.g., there are many CSS definitions (i.e. much "text").

A fix might involve skipping some tags such as <svg>, <style>, and <script> when gathering text data. I'd be happy to contribute a fix if you agree with the bug report and when we've converged on a solution proposal.

Related links

Reproduction

9.5.29-blog-readtime-inline-svg.zip

I've edited venv/lib/python3.12/site-packages/material/plugins/blog/readtime/__init__.py to demonstrate the current behavior of the HTML parser:

     # Extract words from text and compute readtime in seconds
+    print("DEBUG: ", parser.text)
     words = len(re.split(r"\W+", "".join(parser.text)))
     seconds = ceil(words / words_per_minute * 60)

Steps to reproduce

  1. Unzip the reproduction.
  2. Run mkdocs serve.
  3. Observe the line

    DEBUG:  ['Readtime includes inline SVG text', '\n', '\n  ', '\n    ', '\n      ', '\n        <![CDATA[\n        .fill-red {\n          fill: red;\n        }\n        ]]>\n      ', '\n    ', '\n    ', '\n  ', '\n  ', 'Red SVG rectangle', '\n']

    in the terminal which shows the list of text data extracted by the HTML parser of the readtime plugin.

Browser

No response

Before submitting

squidfunk commented 1 month ago

Thanks for reporting.

A fix might involve skipping some tags such as ,

Yes, happy to accept a PR here. We need to build some logic to skip adding of content when inside specific tags:

https://github.com/squidfunk/mkdocs-material/blob/4f8081c268d31bf74d546e600cadd0cff7dc89e8/src/plugins/blog/readtime/parser.py#L28-L45

The search plugin already does that here:

https://github.com/squidfunk/mkdocs-material/blob/4f8081c268d31bf74d546e600cadd0cff7dc89e8/src/plugins/search/plugin.py#L383-L388

https://github.com/squidfunk/mkdocs-material/blob/4f8081c268d31bf74d546e600cadd0cff7dc89e8/src/plugins/search/plugin.py#L409-L414

https://github.com/squidfunk/mkdocs-material/blob/4f8081c268d31bf74d546e600cadd0cff7dc89e8/src/plugins/search/plugin.py#L513-L516

sisp commented 1 month ago

Thanks for the very helpful pointers to the search plugin! I've submitted a PR.

squidfunk commented 1 month ago

Keeping open until released.

squidfunk commented 1 month ago

Released as part of 9.5.30.