subject-f / cubarimoe

GNU Affero General Public License v3.0
60 stars 16 forks source link

Markdown parser updates #18

Closed BrutuZ closed 1 year ago

BrutuZ commented 1 year ago

Parsed after the markdown format since it has the added benefit of avoiding its closing brackets at the end of the URL

BrutuZ commented 1 year ago

It shouldn't break in pure Python, I did test it in the console to make sure it was matching as expected and had no issues. Maybe escape the single quote too rather than remove it? It is a valid URL character after all, and not that unusual since it's often used to enclose values in parameters

funkyhippo commented 1 year ago

Break may have been the wrong word. What's your expected input?

>>> import re
>>> def _parse_links(input_str: str) -> str:
...     input_str = re.sub(
...         r"\[([\w\W]+?)\]\(([\w\W]+?)\)",
...         r'<a href="\2">\1</a>',
...         input_str,
...         flags=re.MULTILINE,
...     )
...     return re.sub(
...         r"(?<!href=\")(https?:\/\/[-a-zA-Z0-9._~:/?#@!$&()*+,;=%]+')",
...         r'<a href="\1">\1</a>',
...         input_str,
...         flags=re.MULTILINE,
...     )
...
>>> test_str = """
... https://example.com
... [example](https://example.com)
... https://example.com'
... """
>>> print(_parse_links(test_str))

https://example.com
<a href="https://example.com">example</a>
<a href="https://example.com'">https://example.com'</a>

Did you intend to only support links that were suffixed with '? It's outside the character set.

BrutuZ commented 1 year ago

Did you intend to only support links that were suffixed with '? It's outside the character set.

That... was an oversight between converting the expression enclosure from single to double-quotes in the browser editor 😅

BrutuZ commented 1 year ago

Added some other minor changes, doesn't seem like I broke anything 😅 image

BrutuZ commented 1 year ago

Anything else holding this and #20 ? I have the changes from both PRs (and a few more) running on a forked instance with no issues so far

funkyhippo commented 1 year ago

No blockers at a glance -- sorry for the slow review, I've just been incredibly busy with irl.

I'll block out some time this weekend to take a closer look and/or merge.

BrutuZ commented 1 year ago

Should have addressed all notes 🤞

BrutuZ commented 1 year ago

I couldn't figure an elegant way to filter trailing spaces as well. While another \w at the end might handle it, it would also make it impossible to format single characters like A letter or 1 digit. Turns out a space with the {0} quantifier doesn't work as a negative 🙁