Closed Crozzers closed 8 months ago
- Registering extras is a bit of a mess at the moment
- All subclasses of Extra are auto-detected and registered during _setup_extras. I intend to change so extras must explicitly register themselves
Changed in 5478960 so that extras must manually registered by calling the Extra.register
function.
During Markdown._setup_extras
it will now iterate over all registered extras and instantiate any that are found in self.extras
using the parameters from self.extras
I have also dropped Python 3.5 support since it was throwing errors in the test suite due to some type hints. Python 3.5 was EOL 2 years and 9 months ago so this shouldn't be too much of an issue
Somehow this slipped through the cracks for me. I'll get it into my queue
Thanks for the ping.
Markdown2 is just one of three supported markdown engines. Each is configured a little differently with different capabilities. The things I would use this for (recognizing autolinked phrases like issue1) etc are done by changing the text seen by markdown2 so it includes the proper markdown syntax for the links.
So I try to do all customization via markdown syntax. I do enable options (e.g. fenced blocks, line breaks on newline) that are supported on all the engines.
That said, I do have custom code for markdown and mistune that handle setting the rel attributes on links or try to make embedded html safer.
Not sure if the customization methods of other engines may be useful feedback. But you can see what other contributers have done at:
https://github.com/roundup-tracker/roundup/blob/master/roundup/cgi/templating.py#L122-L235
Thanks for the heads-up! This looks like a sensible refactor, no complaints really. :)
From an API standpoint I'm not sure if having a dedicated test
method is particularly useful (extras could just do this check as the first part of run
if they want to improve performance?`), but overall this all looks good.
Apologies for the slow reply, Thanks all for the feedback!
@rouilj I took a look at the examples you linked, and tried my hand at implementing a few of them. All worked out pretty well, although a bit more manual work was needed since this library lacks facilities like the TreeProcessor
base class. Instead of iterating over a list of elements, I parsed the output text for <a href...
and replaced the hashed links. Not nearly as elegant but definitely achievable.
The developer experience could be made even better by perhaps implementing a LinkPostProcessor
base class that finds all <a>
tags, parses the HREF and REL attrs and passes it to an internal sub
method that can edit the text for final output.
class LinkPostProcessor(Extra, ABC):
order = Stage.after(Stage.LINKS)
def test(self, text):
return True
@abstractmethod
def sub(self, match):
...
def run(self, text):
return re.sub(r'(<a.*?)(?P<href>href="md5-[a-z0-9]{32}")(.*>)', self.sub, text)
class MyExtra(LinkPostProcessor):
name = 'my-extra'
def sub(self, text):
return f'[REMOVED A LINK WITH WITH HREF {match.group("href"}]'
I could see such convenience classes being quite useful for quickly creating extras. Would likely implement as and when needed, probably to simplify existing extras that function in similar ways.
It's not a direct replacement for TreeProcessor
and the like but hopefully that sort of idea will do.
@mhils The separated test
method was something that I saw in Markdown's extension API docs and replicated here. Personally I think separating this functionality from the run
function is cleaner and the explicit method is a good reminder to think about performance.
That said, if one didn't care for performance, including a def text(*_): return True
method for each extra would be tedious so perhaps having the base implementation always return True
with the option for overriding it would be a good compromise?
Thanks @Crozzers! 🍰
Personally I think separating this functionality from the
run
function is cleaner and the explicit method is a good reminder to think about performance.
This sounds reasonable to me - I think an explicit reminder may be a good idea. :) In that case it should probably not have a default implementation, otherwise that defeats that purpose. 😄
We'd have our first real use case for custom extras over in https://github.com/mitmproxy/pdoc/issues/639, so while no hurries please I'm looking forward to this landing at some point. 😃
@nicholasserra would it be alright to revisit this PR? With all the recent changes to different extras it's getting a bit cumbersome to keep porting them back to this branch. There are also a couple of issues that would benefit from this:
*.md
links to .html
links would be easier with custom extra infrastructureMany thanks
Sure thing i'll get this on my priority list thank you
This is looking pretty fancy to me. I like it. Anythings better than the free for all we have right now. Still wrapping my head around the numerical aspect of the stages though. What's that getting us that something like an ordered list or dict isn't? I need to read through the code a bit longer to properly grasp what's happening.
But overall I think this should be great and we'll land it as-is or with minor tweaks. Just give me a little more time to think on it.
Still wrapping my head around the numerical aspect of the stages though. What's that getting us that something like an ordered list or dict isn't?
An ordered list would be simpler and easier and would dodge this flaw in the number system:
Calling
Stage.before
10 times on the same stage will result in extras overflowing into the next stage
The only issue is: if an extra is tied to a particular stage and that stage gets skipped for some reason, do we still want to run that extra?
With the numbering system, we can say that anything where 100 <= order <= 200
is tied to HASH_HTML
and we can skip them. With an ordered list, it would look like this:
[Stage.PREPROCESS, FencedCodeBlocks, Stage.HASH_HTML, FencedCodeBlocks, Stage.LINK_DEFS]
FencedCodeBlocks
is registered as before LINK_DEFS
and after PREPROCESS
, but with an ordered list we can't tell and therefore can't skip it.
We could instead have ordered lists for each extra. For example:
# I've manually filled out the lists here but this would be filled by calling `Stage.before/after`
class Stage:
# tuple containing the `before` and `after` lists respectively. Could be done as a namedtuple
PREPROCESS = ([], [FencedCodeBlocks])
HASH_HTML = ([], [])
LINK_DEFS = ([FencedCodeBlocks], [])
This solves the issue of numbers overflowing whilst still allowing us to tie extras to particular stages. The only downside is if an extra sets its order as Stage.before(FencedCodeBlocks)
, it's a bit more tedious to insert it in all the right lists, but this only happens at startup so isn't a huge concern.
I'll play around with this and see what happens
@nicholasserra, I've refactored the ordering system for extras. I've moved most of the ordering logic into the Extra
class and made Stage
into a proper enum. The big change is that the numbering system is now gone.
The type of the Extra.order
attribute is now a tuple of two iterables. This represents "before" and "after". EG:
class MyExtra(Extra):
# run before preprocessing, after HASH_HTML and after the FencedCodeBlocks extra is run
order = [Stage.PREPROCESS], [Stage.HASH_HTML, FencedCodeBlocks]
When you call MyExtra.register
, it will calculate when this extra should be triggered and store this info in Extra._exec_order
. When all the extras are registered, it looks something like this:
{<Stage.PREPROCESS: 1>: (
[], # before
[<class 'markdown2.Mermaid'>, # after
<class 'markdown2.Wavedrom'>,
<class 'markdown2.FencedCodeBlocks'>]),
<Stage.HASH_HTML: 2>: (
[], # before
[<class 'markdown2.MarkdownInHTML'>]), # after
[...]
<Stage.ITALIC_AND_BOLD: 14>: (
[<class 'markdown2.Underline'>, # before
<class 'markdown2.Strike'>,
<class 'markdown2.MiddleWordEm'>,
<class 'markdown2.CodeFriendly'>],
[<class 'markdown2.Breaks'>, # after
<class 'markdown2.MiddleWordEm'>,
<class 'markdown2.CodeFriendly'>,
<class 'markdown2.TelegramSpoiler'>])}
As mentioned previously, the Stage
class is now an IntEnum
containing the different stages. This was done so that extras could check the current stage of execution, and whether they were being called before/after a particular stage. This is useful for extras that need to be called before AND after a stage.
Calling Extra.deregister
now also removes it from Extra._exec_order
, allowing users to potentially change the order of an extra on the fly.
Finally, in the changelog I've put these changes under the most recent header, which is 2.4.13
, but I suspect this change may warrant a bump to 2.5.0
@Crozzers This is all looking amazing. I think i'm ready to merge this in. Thank you for all the work on this!
I see you have some other open PRs. How would you like to proceed here? Merge this in now, and then fix up the others (if needed), or should I merge those others in and this PR can be rebased? Your call.
I think i'll do a release of 2.4.13 before this lands. Then we can move these changes into 2.5.0 as you recommended. I'll just edit the changelog manually when it comes time.
None of the other PRs are particularly high priority and #568 requires some further investigation so I'd say merge this one first and I can update the others if needed. They're small patches anyway so shouldn't need much doing to them.
Thanks for taking the time to review this PR, it ended up as a bit of a monster
Ok then i'm gonna do a release of the current changes, then merge this, then manually bump us to 2.5.0 after
This PR closes #382 by adding support for custom extras. @JeffAbrahamson, @Einenlum and @kjaymiller, you've all expressed interest in this feature, so I'd love to hear any thoughts from you all.
How it works
Extras
Extras must inherit from the new
Extra
class. See theStrike
extra for a basic example of the implementation. Each extra must define the following properties:name
extras=[]
param, likefenced-code-blocks
.order
run()
test()
fenced-code-blocks
extra might check'```' in text
Extra options
The new extras support passing in extra parameters, like with existing extras. These are passed to the initialiser of the extra via the
extras
dict in theMarkdown
class (see_setup_extras
)Stages
Converting markdown to HTML happens in distinct stages, such as preprocessing, hashing HTML, processing lists, etc. These distinct stages are given numeric IDs in the
Stage
class which denote the order they run in.The
Markdown
class methods that correspond to each stage get decorated with@Stage.mark
. For example:Now, when
preprocess
is called, a wrapper function will find all registered extras that are in the same band as this stage and execute them.Order of execution
As mentioned previously, each stage has its own numeric ID, each 100 apart from the last. Calling
Stage.before/after(Stage.WHATEVER)
will return that stage's ID minus/plus 5. Conescutive calls increment an internal counter and will continue subtracting/adding 5. For example:This means that multiple extras can be registered for the same stage and will be executed in the order in which they were registered.
Known flaws
Stage.before
10 times on the same stage will result in extras overflowing into the next stageStage.after([some stage])
will actually meanStage.before([next stage])
Stage.after([stage], step=1)
to increase the number of calls that can be made before this happensExtra
are auto-detected and registered during_setup_extras
. I intend to change so extras must explicitly register themselvesfootnotes
extra and a few others are tightly integrated and I haven't been able to untangle them as of yetMarkdown
class so there are still shims in place