markdown-it-rust / markdown-it

markdown-it js library rewritten in rust
Other
79 stars 9 forks source link

Typographer (Replacing ©, ®, ™, ... etc) #4

Closed black-puppydog closed 1 year ago

black-puppydog commented 1 year ago

This is a straight-up re-implementation of lib/rules_core/replacements.js from the JS library.

There are still a few corner cases I'm not sure about, but this is progressed enough that I'd like feedback if anyone is willing to give. 🙂

I'm especially unsure about the linkify.rs modification. I needed this because the JS implementation checks for auto-linkified links, and doesn't touch the text in those. I tried multiple ways to do this.

The "easiest" would have been to be able to walk up the tree of nodes, to determine whether one of my parents is Linkified. We don't have that though, right?

The next attempt was to modify walk_mut so that I could pass a second closure that would be called when exiting the node. That way I could keep track of the linkification status, similar to how the JS lib does it. However, I never managed to get this past the borrow checker, because I was using a single linkification_level: i32 which I needed to mut-pass to both the entry and exit closure. It might well be that there would be a trivial fix for this (I'm still quite new to Rust and my stdlib-foo is not great) but in any case that would mean a bunch of boiler-plate to insert into all four walk[_post][_mut] methods and everywhere they're used.

So my solution is to attach an empty LinkifyMarker struct to the nodes when they get auto-linkified. I then just check if the text node I'm dealing with has that marker, and ignore it if it's the case.

Side note: the Error I'm encountering in #3 is trivial to fix with this branch: just add the typographer plugin to the parser during test, and it seems work!

To do

black-puppydog commented 1 year ago

Okay after adding another test (fixture) and cross-checking that one with the JS implementation as well, I'm now reasonably confident that this works as intended. :)

rlidwka commented 1 year ago

The "easiest" would have been to be able to walk up the tree of nodes, to determine whether one of my parents is Linkified. We don't have that though, right?

In theory, custom walk function will do the job (i.e. not using walk_mut, but writing your own). Not ideal, I know.

So my solution is to attach an empty LinkifyMarker struct to the nodes when they get auto-linkified. I then just check if the text node I'm dealing with has that marker, and ignore it if it's the case.

Actually, as far as I remember, text nodes that shouldn't be replaced should have TextSpecial type instead of Text (which is not done in JS for historical reasons). I'll look into fixing that.

Decide on where/whether this should be included in any default values in the crate

People usually use replacements together with smartquotes. Since smartquotes aren't implemented, I don't think this feature is complete, thus don't enable it by default yet.

rlidwka commented 1 year ago

I'm especially unsure about the linkify.rs modification. I needed this because the JS implementation checks for auto-linkified links, and doesn't touch the text in those. I tried multiple ways to do this.

I believe this is the best way to fix it: https://github.com/rlidwka/markdown-it.rs/commit/547505e8b722fadbb5e49c1ece647dd2481eb3ed

codecov-commenter commented 1 year ago

Codecov Report

Base: 95.13% // Head: 95.32% // Increases project coverage by +0.19% :tada:

Coverage data is based on head (547505e) compared to base (c9b087a). Patch coverage: 100.00% of modified lines in pull request are covered.

:exclamation: Current head 547505e differs from pull request most recent head 7fa1847. Consider uploading reports for the commit 7fa1847 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #4 +/- ## ========================================== + Coverage 95.13% 95.32% +0.19% ========================================== Files 58 54 -4 Lines 2302 2226 -76 ========================================== - Hits 2190 2122 -68 + Misses 112 104 -8 ``` | [Impacted Files](https://codecov.io/gh/rlidwka/markdown-it.rs/pull/4?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin) | Coverage Δ | | |---|---|---| | [src/plugins/cmark/inline/autolink.rs](https://codecov.io/gh/rlidwka/markdown-it.rs/pull/4/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin#diff-c3JjL3BsdWdpbnMvY21hcmsvaW5saW5lL2F1dG9saW5rLnJz) | `100.00% <100.00%> (ø)` | | | [src/plugins/extra/linkify.rs](https://codecov.io/gh/rlidwka/markdown-it.rs/pull/4/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin#diff-c3JjL3BsdWdpbnMvZXh0cmEvbGlua2lmeS5ycw==) | `96.72% <100.00%> (+0.11%)` | :arrow_up: | | [examples/ferris/main.rs](https://codecov.io/gh/rlidwka/markdown-it.rs/pull/4/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin#diff-ZXhhbXBsZXMvZmVycmlzL21haW4ucnM=) | | | | [examples/ferris/inline\_rule.rs](https://codecov.io/gh/rlidwka/markdown-it.rs/pull/4/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin#diff-ZXhhbXBsZXMvZmVycmlzL2lubGluZV9ydWxlLnJz) | | | | [examples/ferris/block\_rule.rs](https://codecov.io/gh/rlidwka/markdown-it.rs/pull/4/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin#diff-ZXhhbXBsZXMvZmVycmlzL2Jsb2NrX3J1bGUucnM=) | | | | [examples/ferris/core\_rule.rs](https://codecov.io/gh/rlidwka/markdown-it.rs/pull/4/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin#diff-ZXhhbXBsZXMvZmVycmlzL2NvcmVfcnVsZS5ycw==) | | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alex+Kocharin)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

black-puppydog commented 1 year ago

Thanks very much for the feedback, I'll look into it once I find a few quiet minutes. :)

People usually use replacements together with smartquotes. Since smartquotes aren't implemented, I don't think this feature is complete

Indeed, that's a good point. I'm already looking into porting the smartquotes next, although the algorithm used in JS is annoying to implement in rust due to borrow checking. But I'm sure I'll get there :D

black-puppydog commented 1 year ago

This became quite a comprehensive rebase so that lazy_static doesn't get used anywhere in the commit history. Thanks to your commit, the whole linkify modification seems to have become unnecessary. Yay for simplicity! 🎉

The linebreak question is a separate issue, if any. I also added a bit of documentation, especially also noting that this is not enabled with extras by default.

black-puppydog commented 1 year ago

Just found time to add one last thing here; the way I had used once_cell was the "simple but verbose" way. I just added a commit that takes advantage of once_cell::Lazy which simplifies the code a bit.

black-puppydog commented 1 year ago

Following the comments in #5 I went ahead and made the changes you indicated. Thanks again for the feedback, this is a great learning experience for me! I appreciate that it takes a lot of time to do these reviews.

unreachable instead of panicCow and if let instead of to_string and while ✅ closure instead of impl Replacer