Closed pawamoy closed 6 months ago
Quick solution:
AUTO_REF_RE = re.compile(
r"<span data-(?P<kind>autorefs-identifier|autorefs-optional|autorefs-optional-hover)="
r'("?)(?P<identifier>[^"<>]*)\2(?P<attrs> [^>]*)?>(?P<title>.*?)</span>',
flags=re.DOTALL,
)
>>> from mkdocs_autorefs.references import AUTO_REF_RE
>>> AUTO_REF_RE.search('<span data-autorefs-identifier="hey" data-preview data-preview="0" data-preview>hello</span>').groupdict()
{'kind': 'autorefs-identifier', 'identifier': 'hey', 'attrs': ' data-preview data-preview="0" data-preview', 'title': 'hello'}
The data-autorefs attribute must still appear first, and the regex now also captures everything after this first attribute and before the closing >
. This captured group can then be reinjected as-is in the anchors.
A bigger change that could maybe bring more robustness to future changes and features, would be to use a custom tag to delimitate auto-references, something like <autoref ...>...</autoref>
. With this, it becomes easy to match autorefs as just plain strings, and maybe parse their attributes with a custom HTML parser::
AUTO_REF_RE = re.compile(r"<autoref (?P<attrs>.*?)>(?P<title>.*?)</autoref>")
from html.parser import HTMLParser
class AttrsParser(HTMLParser):
def __init__(self):
super().__init__(self)
self.attrs = []
def parse(self, html):
self.attrs.clear()
self.feed(html)
return self.attrs
def handle_starttag(self, tag, attrs):
self.attrs.extend(attrs)
# for each match, build f"<a {match.group("attrs")}></a>" and pass it to the parser
AttrsParser().parse('<a data-preview data-identifier="pathlib.Path" data-other="0">some title</a>')
# [('data-preview', None), ('data-identifier', 'pathlib.Path'), ('data-other', '0')]
I don't expect much impact on perfs since we'd only parse the auto-references attributes and nothing else.
This change would also let us keep complex HTML inside the autoref tag (see https://github.com/mkdocstrings/autorefs/pull/40).
The regex approach sounds good to me. I only had the urge to tweak some (even pre-existing) parts of that regex :sweat_smile:
AUTO_REF_RE = re.compile(
r"<span data-(?P<kind>autorefs-(?:identifier|optional|optional-hover))="
r'("?)(?P<identifier>[^"<>]+)\2(?P<attrs> [^<>]+)?>(?P<title>.*?)</span>',
flags=re.DOTALL,
)
To get you the rest of the way there, you can still use this AttrsParser
in a slightly creative way
To get you the rest of the way there, you can still use this AttrsParser in a slightly creative way
This wouldn't be needed as we already got the kind and identifier from the regex. The rest (attrs) can just be copy-pasted into the anchor :slightly_smiling_face:
Is your feature request related to a problem? Please describe.
I'd like to take advantage of mkdocs-material's instant previews, by adding
data-preview
to autorefs spans generated by mkdocstrings-python. Unfortunately, autorefs matches spans with a regex, and this regex is strict and only matches spans with exactly onedata-autorefs-*
attribute.Describe the solution you'd like
I'd like autorefs to allow other data- attributes to appear in its spans, and report them to anchors when it transforms them.
Describe alternatives you've considered
/
Additional context
https://github.com/squidfunk/mkdocs-material/issues/6704