miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
818 stars 118 forks source link

Include custom HTML attributes #170

Open legenderrys opened 1 year ago

legenderrys commented 1 year ago

This is similar to this PR#134

I would like to propose an alternative solution that will not alter existing render methods.

Html attribute block Proposed Spec

Line containing the following string ${html_attr_name:value, another_html_attr:value} will describe the html attributes for the element proceeding it.

Contents within the ${...} string will be a comma separated list of key/value pairs. The > character will separate parent attributes from child attributes.

Example Html attribute block INPUT

${id:my-value, class:some-class}
# Mistletoe is Awesome

${id:my-list, class:foo >class:bar-items}
- Item One
- Item Two
- Item Three\

OUTPUT

<h1 id="my-value" class="some-class">Mistletoe is Awesome</h1>
<ul id="my-list" class="foo">
    <li class="bar-items">Item One</li>
    <li class="bar-items">Item Two</li>
    <li class="bar-items">Item Three</li>
</ul>
anderskaplan commented 1 year ago

Hi, a few comments from someone who's been messing a bit with this code base recently.

I think the approach with a new block token is a good one, and it fits nicely in the markdown syntax. It definitely has an advantage over the other proposed method to add attributes, in that you don't need to write a custom renderer to use it.

Here are a few suggestions:

  1. The way this is added into block_tokenizer.make_tokens() feels a bit hacky. Why not treat it like any other block token and handle it in HTMLRenderer.render_html_attributes() instead? That's more in line with the extensible design of mistletoe.
  2. Since this is optional and non-standard functionality, I'd suggest to not add it to HTMLrenderer but rather to create a new renderer class for it in the contrib folder, derived from HTMLRenderer of course. The new block token class could go there as well.
  3. The format for the ${...} strings should be defined more clearly. What about spaces, newlines, escaping, ...? Not saying it needs to be more complicated than necessary -- just that it should be well defined.
  4. Markdown can get complicated and I'm not sure if the proposed design will work well with lists nested inside lists, for example. It would be nice with some test cases to demonstrate how it handles more difficult cases. For example, if you set the id attribute on a list that has a nested list inside it, you don't want the same id on the inner list because that would be invalid html.

Hope this can be helpful to make it a good addition to mistletoe!

legenderrys commented 1 year ago

Hi, a few comments from someone who's been messing a bit with this code base recently.

Thank you for the response and suggestions.

  1. The format for the ${...} strings should be defined more clearly. What about spaces, newlines, escaping, ...? Not saying it needs to be more complicated than necessary -- just that it should be well defined.

Using the {...} characters was causing conflicts with other renderers namely the test_XWiki20Renderer.py macros test failing tests. Although your suggestion to create a new HTMLrenderer may help fix those issues here.

Per the block syntax. Lets get inline working first before multi-line. I'd imagine we could repurpose the code block functionality

  1. Markdown can get complicated and I'm not sure if the proposed design will work well with lists nested inside lists, for example. It would be nice with some test cases to demonstrate how it handles more difficult cases. For example, if you set the id attribute on a list that has a nested list inside it, you don't want the same id on the inner list because that would be invalid html.

Great point, id attributes should always be unique to pass accessibility standards. We could add a numeric suffix onto the nest element id each time props are passed down from parents. I could also experiment with making the apply_props recursive for child tokens.

legenderrys commented 1 year ago

I've updated the branch to include adding unique id attribute values for deeply nested lists.

Below is an example currently in place INPUT

${id:todos, tabindex:100 > class:list-item}
- Item One
- Item Two
    - item two.one
    - item two.two
        - apples
        - oranges
- Item Three

OUTPUT

<ul id="todos" tabindex="100">
    <li class="list-item" tabindex="1">Item One</li>
    <li class="list-item" tabindex="1">Item Two
        <ul id="todos-2" tabindex="1">
            <li class="list-item" tabindex="1">item two.one</li>
            <li class="list-item" tabindex="1">item two.two
                <ul id="todos-2-3" tabindex="1">
                    <li class="list-item" tabindex="1">apples</li>
                    <li class="list-item" tabindex="1">oranges</li>
                </ul>
            </li>
        </ul>
    </li>
    <li class="list-item" tabindex="1">Item Three</li>
</ul>
anderskaplan commented 1 year ago

You could also consider to let the attributes apply only to the immediately following block token. Then it would be possible to add another set to the inner list, like so:

${id:todos}
- Item One
- Item Two
    - item two.one
    - item two.two
        ${id:shopping-list}
        - apples
        - oranges
- Item Three
legenderrys commented 1 year ago

You could also consider to let the attributes apply only to the immediately following block token. Then it would be possible to add another set to the inner list, like so:

Hmm, I had that idea in the beginning, however the current tokenization processing was a struggle and we need a system to pass parent props to children.

Or perhaps an Emmet style syntax at the root level may function better.

🐊 ${id:todos > id:shopping-list > class:shopping-list-items}

legenderrys commented 1 year ago

@anderskaplan There is still a few things I need to update in this PR and I'd like to know what is the criteria for getting code merged into master? Thanks for your help and feedback.

things todo:

  1. Implement attributes for all other html elements
  2. Add test cases for each html element
  3. Update Readme? ( not sure If the readme should include this feature )
anderskaplan commented 1 year ago

Ok, that's a question for @pbodnar. I can only help with some guidance on how to make the code fit in with the overall design of the codebase.

legenderrys commented 1 year ago

@hyperking, thank you for your contribution (and thanks to @anderskaplan for giving some useful feedback :)). I have gone through it quickly and gave you some feedback. There are no exactly described criteria. Generally, the code should be production-ready (or "fully baked" ;)) before being merged into master.

🤓 Awesome! With the holidays underway I may have some in between time to address any loose ends with this PR.

It also helps if commits and their messages are already as close to the final state as possible. So rebasing (squashing etc.) and force-pushing is usually OK to achieve that, even though this may lead to losing links from conversations to the code...

🤓 Yes, I'll make an effort to clean up the PR as close to upstream/master as possible without any conflicts. I've made a test run locally and git didn't seem to complain.

  1. Selecting the syntax for attributes Provided we would want to use just {...} instead of ${...} and provided a test is failing, just fixing that test could be a solution?

🤓 I agree, the original { ... } syntax would be ideal. However it does conflict with other aforementioned renders. I did make this renderer configurable so that the end user could specify an alternative character literals.

See configure statement here

  1. HTMLAttributes, or ExtraAttributes? Any thoughts on making this mechanism, or token, general? I.e. the attributes maybe don't have to be necessarily just for HTML?

🤓 HTMLAttributes are the common term used in web development for describing attributes on a given DOM node. per this feature, It's currently a general token. The token currently only supports basic markdown that renders HTML DOM nodes and Users are required to specify the HTMLAttributesRenderer vs the HTMLRenderer

Also, HTMLAttributes are being added into block_token.__all__ by this PR now, so they would be already available to all the other renderers, right?

🤓 Yes, It's also dependent on if the qualifying token string is present in the markdown ( outside of any code blocks )

And a related question: shouldn't we do this feature / parsing optional (thinking about performance and backwards compatibility here)?

I'd love to see how this performs, but generally this feature would come at a cost, but Its optional though 🤷‍♂️