mitya57 / python-markdown-math

Math extension for Python-Markdown
https://pypi.org/project/python-markdown-math/
BSD 3-Clause "New" or "Revised" License
120 stars 27 forks source link

Inline Tex inside image annotation #22

Closed Evidlo closed 5 years ago

Evidlo commented 5 years ago

I have some Tex that I'd like to put inside an image annotation, like so

![Hello world $3x + 2$](foo.jpg)

I dug down a bit and it seems that handle_match_inline is correctly substituting the script tag, but this is getting stripped off in a later stage.

def handle_match_inline(m):
    node = etree.Element('script')
    node.set('type', self._get_content_type())
    node.text = AtomicString(m.group(3))
    result = _wrap_node(node, ''.join(m.group(2, 3, 4)), 'span')
    print(etree.tostring(result))
    return result
In [22]: md.convert('![$3x + 2$](a.jpg)')                                                 
b'<script type="math/tex">3x + 2</script>'
Out[22]: '<p><img alt="3x + 2" src="a.jpg" /></p>'
mitya57 commented 5 years ago

How do you expect it to work? The alt attribute is a plain string, and obviously you cannot have HTML markup in it.

Evidlo commented 5 years ago

I should have given more detail. I wrote a TreeProcessor extension (markdown-captions) which puts the markdown image text inside a <figcaption> where HTML markup is valid:

[ins] In [15]: md = markdown.Markdown( 
          ...:     extensions=['mdx_math', 'markdown_captions'], 
          ...:     extension_configs = { 
          ...:         'mdx_math': { 
          ...:             'enable_dollar_delimiter': True 
          ...:         } 
          ...:     } 
          ...: )                                                                                

[nav] In [16]: md.convert('![$3x + 2$](a.jpg)')                                                 
Out[16]: '<p><figure><img src="a.jpg" /><figcaption>3x + 2</figcaption></figure></p>'

So my question is why/where is the <script> tag getting stripped and how might I fix this?

mitya57 commented 5 years ago

Thanks, it makes more sense now.

The ImageInlineProcessor calls unescape() here: https://github.com/Python-Markdown/markdown/blob/3.1.1/markdown/inlinepatterns.py#L614

And unescape() removes all markup and leaves only text content: https://github.com/Python-Markdown/markdown/blob/3.1.1/markdown/inlinepatterns.py#L240

Evidlo commented 5 years ago

Thanks. I switched my extension to a LinkInlineProcessor and set the priority just above the ImageInlineProcessor and it works now.