mkdocstrings / autorefs

Automatically link across pages in MkDocs.
https://mkdocstrings.github.io/autorefs/
ISC License
49 stars 7 forks source link

feature: Keep data-attributes of spans #41

Closed pawamoy closed 6 months ago

pawamoy commented 6 months ago

Is your feature request related to a problem? Please describe.

I'd like to take advantage of mkdocs-material's instant previews, by adding data-preview to autorefs spans generated by mkdocstrings-python. Unfortunately, autorefs matches spans with a regex, and this regex is strict and only matches spans with exactly one data-autorefs-* attribute.

Describe the solution you'd like

I'd like autorefs to allow other data- attributes to appear in its spans, and report them to anchors when it transforms them.

Describe alternatives you've considered

/

Additional context

https://github.com/squidfunk/mkdocs-material/issues/6704

pawamoy commented 6 months ago

Quick solution:

AUTO_REF_RE = re.compile(
    r"<span data-(?P<kind>autorefs-identifier|autorefs-optional|autorefs-optional-hover)="
    r'("?)(?P<identifier>[^"<>]*)\2(?P<attrs> [^>]*)?>(?P<title>.*?)</span>',
    flags=re.DOTALL,
)
>>> from mkdocs_autorefs.references import AUTO_REF_RE
>>> AUTO_REF_RE.search('<span data-autorefs-identifier="hey" data-preview data-preview="0" data-preview>hello</span>').groupdict()
{'kind': 'autorefs-identifier', 'identifier': 'hey', 'attrs': ' data-preview data-preview="0" data-preview', 'title': 'hello'}

The data-autorefs attribute must still appear first, and the regex now also captures everything after this first attribute and before the closing >. This captured group can then be reinjected as-is in the anchors.

pawamoy commented 6 months ago

A bigger change that could maybe bring more robustness to future changes and features, would be to use a custom tag to delimitate auto-references, something like <autoref ...>...</autoref>. With this, it becomes easy to match autorefs as just plain strings, and maybe parse their attributes with a custom HTML parser::

AUTO_REF_RE = re.compile(r"<autoref (?P<attrs>.*?)>(?P<title>.*?)</autoref>")

from html.parser import HTMLParser
class AttrsParser(HTMLParser):
    def __init__(self):
        super().__init__(self)
        self.attrs = []

    def parse(self, html):
        self.attrs.clear()
        self.feed(html)
        return self.attrs

    def handle_starttag(self, tag, attrs):
        self.attrs.extend(attrs)

# for each match, build f"<a {match.group("attrs")}></a>" and pass it to the parser
AttrsParser().parse('<a data-preview data-identifier="pathlib.Path" data-other="0">some title</a>')
# [('data-preview', None), ('data-identifier', 'pathlib.Path'), ('data-other', '0')]

I don't expect much impact on perfs since we'd only parse the auto-references attributes and nothing else.

This change would also let us keep complex HTML inside the autoref tag (see https://github.com/mkdocstrings/autorefs/pull/40).

oprypin commented 6 months ago

The regex approach sounds good to me. I only had the urge to tweak some (even pre-existing) parts of that regex :sweat_smile:

AUTO_REF_RE = re.compile(
    r"<span data-(?P<kind>autorefs-(?:identifier|optional|optional-hover))="
    r'("?)(?P<identifier>[^"<>]+)\2(?P<attrs> [^<>]+)?>(?P<title>.*?)</span>',
    flags=re.DOTALL,
)

To get you the rest of the way there, you can still use this AttrsParser in a slightly creative way

pawamoy commented 6 months ago

To get you the rest of the way there, you can still use this AttrsParser in a slightly creative way

This wouldn't be needed as we already got the kind and identifier from the regex. The rest (attrs) can just be copy-pasted into the anchor :slightly_smiling_face: