rugk / unicodify

✍️ A browser add-on (Firefox, Chrome, Thunderbird) that allows you to autocorrect common text sequences and convert text characters to a look like a special font.
https://addons.mozilla.org/firefox/addon/unicodify-text-transformer/?utm_source=github.com&utm_medium=git&utm_content=repo-about-description&utm_campaign=github
Other
11 stars 2 forks source link

Version numbers should not be replaced as #40

Closed rugk closed 3 years ago

rugk commented 3 years ago

Bug description

As written in https://github.com/rugk/unicodify/discussions/34 version numbers are incorrectly replaced as mathematical symbols, e.g. ⅓.

Steps to reproduce

Write v1.2.3, because you mention a version number.

Actual behavior

It get's converted to v1⅕.3

Expected behavior

Converting version numbers to brackets is not useful at all. :upside_down_face:

Reproducibility on test website

image

System

Operating system and version: Fedora 34 Browser and version: Firefox 88.0.1 Add-on version: 0.1

Possible solution

from @tdulcet in https://github.com/rugk/unicodify/discussions/34#discussioncomment-692606:

Version numbers are another problem. No, I don't want them to be replaced…

This is probably easily fixed by updating the regular expression that matches numbers: https://github.com/rugk/unicodify/blob/8123e09d899cdaa7c31a85299218393ac4a631be/src/content_scripts/autocorrect.js#L306-L307

rugk commented 3 years ago

BTW updated the RegEx and added some unit tests there, which cover this now: https://regex101.com/r/7jUaSP/3

tdulcet commented 3 years ago

BTW updated the RegEx and added some unit tests there, which cover this now: https://regex101.com/r/7jUaSP/3

I do not see any changes at the link.

For your v1.2.3 example, it can be fixed by not doing the autocorrection if the next character is a period. However, if you switch it to v1.3.2, this would require modifying the regular expression to check that the previous character is not a period. I would need to see your unit tests before suggesting any specific changes to the code.

rugk commented 3 years ago

I just added these “tests” on RegEx101 – click the link and choose “unit tests” below “function” at the left sidebar. It’s the lazy way… :wink:

Also feel free to modify them.

rugk commented 3 years ago

However, if you switch it to v1.3.2, this would require modifying the regular expression to check that the previous character is not a period

Indeed, of course you’re correct.

Yeah, added https://regex101.com/r/7jUaSP/4

tdulcet commented 3 years ago

I just added these “tests” on RegEx101 – click the link and choose “unit tests” below “function” at the left sidebar. It’s the lazy way… 😉

Oh, thanks, I did not see that.

For the v1.2.3 case, a simple && insert !== "." check on this line should suffice: https://github.com/rugk/unicodify/blob/1223be3b86b8941bbb7a095552ef5eda3e5b85ca/src/content_scripts/autocorrect.js#L316

For the v1.3.2 case, we will need to modify the regular expression. Most people would probably use a negative lookbehind assertion like this:

/(?<!\.)\d+(\.\d+)?$/

Those can have a performance cost, so we could do something like this:

/(?:^|[^.])(\d+(\.\d+)?)$/

Although that would of course mess up the capture groups, so I would recommend the first option. I updated your link: https://regex101.com/r/7jUaSP/5 and can create a PR if you want.

rugk commented 3 years ago

LGTM (all changes), feel free to create a PR. :smile:

&& insert !== "." check on this line should suffice:

Just please comment it and link back to the issue here, so it’s understandable why you do this in the source code.

Another suggestion for the RegEx: To make it more readable, we could also use named capture groups like this: https://regex101.com/r/7jUaSP/9 (?<noversionnumber>(?<!\.))\d+(?<fractionpart>\.\d+)?$/gms

I also enabled the s option, because it well… is only about single lines.

tdulcet commented 3 years ago

LGTM (all changes), feel free to create a PR. 😄

OK, will do shortly!

Another suggestion for the RegEx: To make it more readable, we could also use named capture groups like this:

The negative lookbehind is just an assertion, so it does not actually match anything and thus that capture group will always be empty. You would need to use my second suggested regular expression for that, although we of course do not need to capture whatever is before the number.

I also enabled the s option, because it well… is only about single lines.

This does not apply, as it currently does not have any (unescaped) dots, although the number should not be on mutiple lines anyway (see here).