miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
811 stars 113 forks source link

Question: Inconsistencies in the block tokens #163

Closed anderskaplan closed 1 year ago

anderskaplan commented 1 year ago

There are some inconsistencies among the block tokens that maybe should be fixed before stepping up to version 1.0:

  1. Trailing newlines are sometimes preserved and sometimes not. CodeFence and BlockCode preserve them; Paragraph and HTMLBlock do not.
  2. CodeFence and BlockCode keep their content in a single RawText child node, whereas the HTMLBlock keeps it in the content property. In fact, the HTMLBlock is the only block token to have a content property. It is typically used with span tokens.

So what to do about it?

My suggestion would be to remove the trailing newlines from all block tokens. The other consistent option, to keep them for all block tokens, would add a trailing LineBreak to all Paragraph's, and that would just be a pain. Of course there's also the option to leave it as it is.

I would also suggest to place the HTMLBlock content in a single RawText node, so it would be consistent with the other block tokens. Maybe keep its content property, too, in order to not break the API. The content property could be turned into a property getter and marked as deprecated.

Thoughts?

pbodnar commented 1 year ago

The suggestions seem reasonable, I will try to review them soon.

I guess there is an implicit expectation that suggested changes would change the output from renderers, making them possibly more consistent, isn't it?

anderskaplan commented 1 year ago

The proposal will mainly affect the AST. I think the output from the renderers would only change very little, if at all. But it would be easier to make their output more consistent, too, if desired.