miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
841 stars 119 forks source link

Make tables interrupt paragraphs (#166) #187

Closed anderskaplan closed 1 year ago

anderskaplan commented 1 year ago

This PR introduces a more modular way to detect when a block token interrupts a paragraph. It also makes Table tokens interrupt paragraphs, using the new mechanism.

Adding it as a draft PR for now, due to the dependency on #186 .

In addition to the "obvious" use in Paragraph.read(), checks for paragraph interruptions are also done in Quote.read() and ListItem.read() to determine whether a line is a lazy continuation line or not.

Using the new mechanism to check for continuation lines in Quote and ListItem does change the parsing behavior, and I believe it's for the better. The old implementation of these checks did not include HTMLBlock, for example. There are no test cases in the CommonMark test suite to prove it, but I believe the new implementation is closer to the definition of "Paragraph continuation text" in the CommonMark spec.

Up for discussion: as it says in the issue report (#166), making Table interrupt paragraphs is a change of behavior, and maybe it should be a setting. If so, I'd propose to add the setting to the Table class, rather than to make it a renderer parameter, because this is a parsing option and it has nothing to do with the rendering.

coveralls commented 1 year ago

Coverage Status

coverage: 94.106% (+0.06%) from 94.044% when pulling 8ad3ec312ee10ba82b6c446e8ff29152fa7c55d3 on anderskaplan:interrupting-paragaphs into d7f06d05d35bd849ca54865fdc6133132e6cab5f on miyuchina:master.

pbodnar commented 1 year ago

@anderskaplan, while rebasing, I think you would also want to restructure the commits a bit - namely the very last commit currently (6ed847e07fc9203f2b1f872dcbfba51fb0db4b16) content doesn't seem to quite match what is in the commit message?

anderskaplan commented 1 year ago

@anderskaplan, while rebasing, I think you would also want to restructure the commits a bit - namely the very last commit currently (6ed847e) content doesn't seem to quite match what is in the commit message?

I've clarified the commit message a little, but overall I think it was correct -- it's the override of the check_interrupts_paragraph method that does the trick, once the groundwork has been done.

pbodnar commented 1 year ago

Up for discussion: as it says in the issue report (#166), making Table interrupt paragraphs is a change of behavior, and maybe it should be a setting. If so, I'd propose to add the setting to the Table class, rather than to make it a renderer parameter, because this is a parsing option and it has nothing to do with the rendering.

Yes, I would make this a parsing option, i.e. to keep it separate from the options we pass to the various renderers now. I haven't thought about the exact form yet, so if you've already got an idea, that would be great... Maybe introducing a global "parsing options" variable would be the only way for now, without the necessity to do a major refactoring?

anderskaplan commented 1 year ago

Maybe just what about squashing the 2 commits which follow after e2829a4? Or changing their commit message a bit. So that the relation among them (doing de-facto the same for Paragraphs, but also for Quotes and List items) is more visible?

Sure, I think even the 4 commits after [e2829a4] could be squashed into one.

anderskaplan commented 1 year ago

Yes, I would make this a parsing option, i.e. to keep it separate from the options we pass to the various renderers now. I haven't thought about the exact form yet, so if you've already got an idea, that would be great... Maybe introducing a global "parsing options" variable would be the only way for now, without the necessity to do a major refactoring?

I'll think about it and come up with a suggestion :)

anderskaplan commented 1 year ago

I've squashed the commits and added a suggestion for the parsing option:

The option is added as an attribute on the Table class. This limits its scope nicely and works well with mistletoe's modular approach to parsing, where renderers may use a selection of block tokens. If a parsing option should be introduced in the future which affects all block tokens or even all tokens, then it could be added to those base classes.

pbodnar commented 1 year ago

@anderskaplan, thanks a lot, good job. 👍

So we newly allow tables to interrupt paragraphs by default - I will address this in the release notes, mentioning the option block_token.Table.interrupt_paragraph there.