gustavoguichard / string-ts

Strongly typed string functions
MIT License
1.19k stars 17 forks source link

fix: words should drop apostrophes #123

Closed jly36963 closed 9 months ago

jly36963 commented 9 months ago

Words and Apostophes

Words and related casing utilities have weird behavior for words containing '. I updated the logic to be more like lodash's (removing apostrophes before splitting words)

Relates to, but is not a fix for: https://github.com/gustavoguichard/string-ts/issues/122

Words

// Old behavior
type test = Expect<Equal<Words<"don'tGo">, ["don", "'", "t", "Go"]>>;
// Updated behavior
type test = Expect<Equal<Words<"don'tGo">, ["dont", "Go"]>>;

Casing

// Old behavior
type test = Expect<Equal<SnakeCase<"don'tGo">, "don_'_t_go">>;
// Updated behavior
type test = Expect<Equal<SnakeCase<"don'tGo">, "dont_go">>;
p9f commented 9 months ago

I updated the logic to be more like lodash's

I would like to challenge this. Lodash approach is more complex than just "removing apostrophes before splitting words" because lodash behaviour will be different for where's (treated as one word) and ma'am (treated as two words) (you can see the code here, there is more variations of this logic.).

So with the test sentence "Where's the leak ma'am?", we have:

I think I would be in favour of removing all special characters, but still count them as a word separator - i.e. go for: ["Where", "s", "the", "leak", "ma", "am"].


also - do we agree this is a breaking change that should wait for a v2? if yes, we might want to think about this very carefully, because we don't a new major versions to often, so we might want to think about other breaking changes (like lodash remove all special characters like %, &, ... and we currently don't).

jly36963 commented 9 months ago

I would like to challenge this. Lodash approach is more complex than just "removing apostrophes before splitting words".

_.camelCase("Where's the leak ma'am")
// "wheresTheLeakMaam" 

_.snakeCase("Where's the leak ma'am") 
// "wheres_the_leak_maam" 

I guess they do the apostrophe removal at the casing level, before it is passed into words.


We could move the removal of apostrophes to the casing function like they do here. This might be nice for a couple of reasons:


Can you think of scenarios where words should be separated based on an apostrophe?


Either way, SnakeCase<"ma'am"> // ma_'_am is definitely the wrong answer

gustavoguichard commented 9 months ago

Yes, I agree with @p9f that we should think carefully about (any) change in the words method. I like @jly36963 proposal from his latest comment.

If we just "fixed" machine readable casing functions, would that be a major version?

I'm also a little afraid of going down the "perfect grammar" route bc it can lead to a much more complex code. Wdyt?

jly36963 commented 9 months ago

Based on this issue, it might be worth:

gustavoguichard commented 9 months ago

It does makes sense. I tend to adopt the same positioning of J Dalton there: limiting the scope to avoid creating a lil monster, but with that said if no one can come up with a con for that we can add that behavior. AFAICT in both English and Portuguese it will work.. not sure about other languages

p9f commented 9 months ago

@jly36963 ballon d'or -- "golden" should be one word, right?

fwiw in french I think "ballon", "d", "or" would be better since d is the contraction of de, a different word from or (kinda ball [made] o[f] gold). I don't think that really matter though - surely French is out of scope.


I don't see any downside in your latest solution right now, even though I think we should take our time to consider all the implications. It seems the best of both world.

What do you think of, in the same release, changing the logic for special character so that they don't create artificial word? i.e. converting hello, world into hello, world instead of the current hello, ,, world (and this for all character except letters, digits and single quote.

jly36963 commented 9 months ago

I'm not too sure what to do with the different special characters scenarios. It's probably beyond the scope of this PR.

Scenarios:


I've been thinking of a future solution, maybe something like:

I could put this in a github discussion

gustavoguichard commented 9 months ago

I'm not too sure what to do with the different special characters scenarios. It's probably beyond the scope of this PR. Keep in mind that supporting international characters is out of the scope of the library at all.

jly36963 commented 9 months ago

@gustavoguichard I'm okay either way.

I think a majority of people would expect the updated behavior (eg: wheres over where_'_s), but there's always a chance that someone is relying on the current behavior.

gustavoguichard commented 9 months ago

@jly36963 I'd do a major change ;)