retextjs / retext-smartypants

plugin to implement SmartyPants
https://retextjs.github.io/retext-smartypants
MIT License
54 stars 9 forks source link

Incomplete ellipses transforms #9

Closed teddybradford closed 10 months ago

teddybradford commented 1 year ago

Initial checklist

Affected packages and versions

retext-smartypants@5.2.0

Link to runnable example

No response

Steps to reproduce

I think the ellipses transforms are incomplete.

  1. A period after ellipses seem to be treated as part of the ellipses.
  2. Spaced periods don't always get parsed if there's no space around the group.

Here are some examples of what happens currently:

foo.... bar -> foo… bar foo. . .bar -> foo. . .bar foo. . .. bar -> foo… bar foo. . . . bar -> foo… . bar foo . . .bar -> foo . . .bar foo. . . .bar -> foo… .bar

Expected behavior

I would expect those examples to return these values instead:

foo.... bar -> foo…. bar foo. . .bar -> foo…bar foo. . .. bar -> foo…. bar foo. . . . bar -> foo…. bar foo . . .bar -> foo …bar foo. . . .bar -> foo. …bar

Affected runtime and version

node@18.15.0

Affected package manager and version

No response

Affected OS and version

No response

Build and bundle tools

No response

wooorm commented 1 year ago

a) why would more than 3 dots not turn into ellipses? (1, 3, 4, 6) b) I’m not sure what 2 and 5 are supposed to be, I don’t think I‘ve seen people write such characters in English or other languages that I am aware of, flush in the middle of words, or sticking to a next word. retext is about natural language, not programming code or so, it cuts stuff up in sentences, and I don’t understand how sentences work here, retext probably doesn’t either

teddybradford commented 1 year ago

a) Some style guides use a four-dot ellipsis (ellipsis + period) when crossing sentences:

The MLA now indicates that a three-dot, spaced ellipsis . . . should be used for removing material from within one sentence within a quote. When crossing sentences (when the omitted text contains a period, so that omitting the end of a sentence counts), a four-dot, spaced (except for before the first dot) ellipsis . . . . should be used.

https://en.wikipedia.org/wiki/Ellipsis#American_English

b) These cases can probably be ignored then. But as far as I can tell, many style guides don't explicitly say that the spaced ellipses must be surrounded by spaces (i.e., foo . . . bar vs. foo. . .bar)

wooorm commented 1 year ago

Thanks for the link!

To me, your argumentation for a), explains why some people use 4 dots. Not why it should turn into …., as I don’t see an example of the output? I do see one case of that, in the French example.

I think what’s complex about trying to follow these style guides, is that they’re all different, they each have different rules, and then also for authors and for “typesetters”.

But this project doesn’t follow one specific styleguide. And if we’d do, we’d break with the rest, right?

The reason for why four and more periods are turned into ellipsis, is because some humans don’t stop at 3. For example: Wait..... what’s wrong with that?.

teddybradford commented 1 year ago

The more I read and think about it, the more complex I realize this is. It's difficult (impossible?) to parse for all these cases, and intuit, with accuracy, what the intended behavior should be—especially if a text uses varying types of ellipses.

With that in mind, would you consider adding an option to this package that toggles converting triple-spaced dots (but keeps consecutive-dot conversions for ellipses)? This would give more flexibility, making it easier to apply custom, context-specific regexp replacements after running text through this plugin.

wooorm commented 1 year ago

the more complex I realize this is

Yep, same.

would you consider adding an option to this package that toggles converting triple-spaced dots

Yes, I am open to such a feature. I am not interested in writing it myself though: are you? I’d also wonder how it would look exactly, so that can be some back-and-forth to figure out!

teddybradford commented 1 year ago

Maybe something like updating options.ellipses to work like options.dashes:

Create smart ellipses (boolean or 'unspaced', 'spaced', default: true).

Converts triple dot characters (with or without spaces) into a single unicode ellipsis character.

wooorm commented 1 year ago

Maybe! Probably good, but might depend a bit on how the feature you come up with actually works! Previously we also discussed triple vs more-than-triple, was that also planned in this PR/option?

teddybradford commented 1 year ago

I was thinking of keeping it simple and only adding options for enabling/disabling formatting of spaced vs. unspaced ellipses (keeping the ellipses length regex as-is).

github-actions[bot] commented 10 months ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

wooorm commented 10 months ago

released! https://github.com/retextjs/retext-smartypants/releases/tag/6.1.0