miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
791 stars 110 forks source link

feat: record indentation on list item to make Markdown renderer be able to render list markers followed by 1~4 spaces. #197

Closed allets closed 7 months ago

allets commented 8 months ago

With only ListItem.prepend, it is not able to keep original indentation of the 1st line in a list item (#196), so I add the indentation to ListItem as its member. The indentation is the indentation from the margin. I also add it to repr_attributes because prepend is there too.

After recording indentation, Markdown renderer can restore the indentation from the list marker and from the margin. I weigh the trade-off between thoroughly retaining the format and consistency of formatting feature. Therefore, Markdown renderer does not restore the indentation from the margin.

input markdown:

indentation from the margin
 -   The list item is indented by 1 spaces.

indentation from the list marker
-   The list marker is followed by 3 spaces.

implementation details of the 2 examples:

Markdown renderer output:

indentation from the margin
-   The list item is indented by 1 spaces.

indentation from the list marker
-   The list marker is followed by 3 spaces.
coveralls commented 8 months ago

Coverage Status

coverage: 94.118% (+0.005%) from 94.113% when pulling 9933f5ccf957e3c2323c119990f2165aa9a23a33 on allets:feature/keep_original_indentation_of_the_1st_line_in_a_list_item into 5f4c550e57a0c20d8b1f9afb1e68592b32afd466 on miyuchina:master.

allets commented 8 months ago

@pbodnar, thank you for taking the time to give me such detailed feedback!

Also, as usual, you guys don't make it quite easy for me to find out what and why you actually did. 😅 This time, if I get it right, it is not quite about:

record indentation on list item to make Markdown renderer be able to render list markers followed by 1~4 spaces

... but rather about (hopefully you will be able to shorten this in the summary line of your commit message :)):

The old code recorded every list item content's indentation (in token.prepend), but the Markdown renderer always output exactly just 1 space after every list item marker (token.leader). With this change, the original content spacing after the marker is preserved. Also newly recording "margin indentation" before every list item marker - as "margin indentation" is not preserved (for now?), we use this number just for correctly calculating the resulting content spacing.

Is it okay if I change the commit message to the following?

feat: keep the original content spacing after the list marker (#196)

A list item such as ` -   list item` consists of margin indentation
 (A), a list marker (B), 1-4 spaces (C) and contents.
The old code recorded the width from A to C as `ListItem.prepend` and
 B as `ListItem.leader`, 
but the Markdown renderer always skipped margin indentation and output 
exactly just 1 space after every list marker, such as `- list item`.

After this change records the width of A as `ListItem.indentation`,
the Markdown renderer can thoroughly restore A and C.
But the formatting features of other tokens skip margin indentation (A),
for consistency of all formatting features,
the Markdown renderer only restores C, such as `-   list item`.
pbodnar commented 8 months ago

@pbodnar, thank you for taking the time to give me such detailed feedback!

You're welcome. :)

Is it okay if I change the commit message to the following?

feat: keep the original content spacing after the list marker (#196)

...

OK, I like that. Maybe just check this paragraph wording:

After this change records the width of A as `ListItem.indentation`,
the Markdown renderer can thoroughly restore A and C.

Thank you.

anderskaplan commented 8 months ago

Hi, sorry for being late to the party! It's really fun to see the mardown renderer getting used and extended!

The code changes as such look good to me, and the intended behavior seems to be covered with test cases.

I'm not sure if we want the test case testing the margin though. I mean, if we could have the margin preserved as well, then it would be even better. And these code changes don't affect the margin. So I'd recommend to remove that particular test.

I can imagine that there will be more improvements like this, to make the markdown output come closer to the input. And at the same time it can be useful to normalize the output for some use cases. So I'd suggest to make this change optional by means of a "normalize" flag. Normalizing should probably be disabled by default, similar to how the default behavior is to not reflow the lines.

pbodnar commented 8 months ago

Hi, sorry for being late to the party! It's really fun to see the mardown renderer getting used and extended!

Yeah, the markdown renderer seems to find its users quickly. 👍

The code changes as such look good to me, and the intended behavior seems to be covered with test cases.

I'm not sure if we want the test case testing the margin though. I mean, if we could have the margin preserved as well, then it would be even better. And these code changes don't affect the margin. So I'd recommend to remove that particular test.

Good point, if we implement the margin preservation later on, we could also add an appropriate unit test later on. Yet, I can also imagine leaving the test (namely test_list_item_margin_indentation_not_preserved) there, because it somewhat documents the current state of affairs. ;)

I can imagine that there will be more improvements like this, to make the markdown output come closer to the input. And at the same time it can be useful to normalize the output for some use cases. So I'd suggest to make this change optional by means of a "normalize" flag. Normalizing should probably be disabled by default, similar to how the default behavior is to not reflow the lines.

OK, so I take this as a confirmation of my comment here. :) I.e. something like normalize_space_after_list_leader = False?

allets commented 8 months ago

@pbodnar, thank for your review.

OK, I like that. Maybe just check this paragraph wording:

After this change records the width of A as `ListItem.indentation`,
the Markdown renderer can thoroughly restore A and C.

Do you mean using too many code names (A. C) to shorten the description makes it inconvenient to read?

Is it okay if I change that to the following?

After this change records the length of margin indentation (A) as
 `ListItem.indentation`, the Markdown renderer can restore margin
 indentation (A) and the spaces after the list marker (C).
But the formatting features of other tokens skip margin indentation,
for consistency of all formatting features,
the Markdown renderer ONLY restores the spaces after the list marker,
 such as `-   list item`.
pbodnar commented 7 months ago

OK, I like that. Maybe just check this paragraph wording:

After this change records the width of A as `ListItem.indentation`,
the Markdown renderer can thoroughly restore A and C.

Do you mean using too many code names (A. C) to shorten the description makes it inconvenient to read?

...

I guess I should have been less lazy and suggest a small correction right away, here it goes... :)

After this change, the width of A is recorded as `ListItem.indentation`,
so that the Markdown renderer can correctly restore both A and C.
But the formatting ...

Provided we won't introduce the boolean switch discussed above within this PR, all that remains is changing the commit message. If you want, I can do this myself, while squash-merging the commits to the master branch...

allets commented 7 months ago

@pbodnar, all done.

I'm sorry for mentioning it without any Git commits, because I'm not ready yet. You can implement your solution first. If I have a complete solution I will open a pull request.

Thanks for this great project! Thanks for all you guys' contributions to the project.

pbodnar commented 7 months ago

@allets, thank you too!

pbodnar commented 7 months ago

Just for completeness, I've prepared the following remarks regarding this change to the planned Release notes:

COMPATIBILITY REMARKS:

I hope this will be sufficient and we don't need to keep 100% compatibility here (which we could by moving the new indentation parameter to the very end of the constructor, but wouldn't that look "ugly"?).