trentm / python-markdown2

markdown2: A fast and complete implementation of Markdown in Python
Other
2.66k stars 433 forks source link

Link Conversion Fails with Preceding or Following Square Brackets #552

Closed syntaxsurge closed 10 months ago

syntaxsurge commented 10 months ago

Issue Description

I've encountered an issue with the markdown2 library where links are not properly converted to HTML when there are square brackets immediately before or after the link syntax. This issue disrupts the expected formatting and link functionality in the converted Markdown text.

Steps to Reproduce

  1. Use the markdown2 library to convert a Markdown string that includes a link surrounded by square brackets.
  2. Observe that the link is not correctly converted into an HTML anchor tag.

Sample Python Code to Reproduce the Issue

The following Python code snippet can be used to reproduce the issue with the markdown2 library:

import markdown2

# Sample Markdown string with a link surrounded by square brackets
markdown_text = '''
[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3)
[after]
'''

# Convert Markdown to HTML
html_output = markdown2.markdown(markdown_text, extras=['tables', 'footnotes', 'markdown-in-html', 'cuddled-lists'])

# Print the HTML output
print("HTML Output:")
print(html_output)

Expected Result

The link in the Markdown string should be converted into an HTML anchor tag, producing an output similar to:

<p>[before]
<a href="https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3">Triggers of Alarm Systems</a>
[after]</p>

Actual Result

The actual output keeps the Markdown link syntax without converting it into an HTML anchor tag:

<p>[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&amp;list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&amp;index=3)
[after]</p>

This code snippet should help in replicating the issue for troubleshooting and resolving the problem.

Additional Context

This issue seems to be specific to cases where square brackets immediately surround the Markdown link syntax. Removing the surrounding square brackets results in correct HTML conversion.

syntaxsurge commented 10 months ago

Additional Context: Challenges with 'markdown-in-html' Extra and Dynamic Content

I attempted to use the markdown-in-html extra provided by markdown2 to address this issue. However, this approach is not ideal for dynamic content due to the need for multiple processing steps. This method leads to two significant problems:

1. Unintended Conversions with Dynamic Content

When working with dynamic content, processing all HTML tags twice can lead to unexpected conversions. Specifically, there are instances where text that is meant to be included literally (as part of the text, not as Markdown) gets incorrectly converted into HTML. This unintended conversion distorts the intended output and complicates the handling of dynamic content.

Example of Unintended Conversion Issue

Consider the following Markdown input and its conversion process:

Markdown Input:

[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3)
[after]

First Conversion to HTML:

<p>[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&amp;list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&amp;index=3)
[after]</p>

Adding Class for Markdown Processing:

<p markdown="1">[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&amp;list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&amp;index=3)
[after]</p>

2. Inconsistent HTML Structure with Nested Paragraph Tags

The use of markdown-in-html results in an inconsistent HTML structure. Specifically, when a link is converted within a paragraph (<p> tag), it generates a new paragraph tag for the link. This creates nested paragraph tags, which is not standard HTML practice and can lead to display and styling issues.

Example of Nested Paragraph Tag Issue

After adding the class for Markdown processing, the output is:

Processed HTML Output:

<p>[before]

<p><a href="https://www.youtube.com/watch?v=aJOTlE1K90k&amp;list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&amp;index=3">Triggers of Alarm Systems</a></p>

[after]</p>

Here, the link is wrapped in its own paragraph tag, creating a nested structure within the original paragraph tag. This is not inline and disrupts the intended flow and structure of the content.

Crozzers commented 10 months ago

When working with dynamic content, processing all HTML tags twice can lead to unexpected conversions

Were you referring to the & -> &amp; conversions in the snippet? Not 100% sure on what the issue is

The use of markdown-in-html results in an inconsistent HTML structure.

I believe the way this works internally is we take the text inside the tags and run the snippet through the markdown parser, which includes forming paragraphs. Not sure how we would avoid the nested paragraphs issue. We could turn off paragraph forming when markdown="1" is attached to a <p> tag? Or maybe add a postprocess that "flattens" a level of paragraphs? Is it possible to wrap the content in <div> instead?