Closed jly36963 closed 9 months ago
I updated the logic to be more like lodash's
I would like to challenge this. Lodash approach is more complex than just "removing apostrophes before splitting words" because lodash behaviour will be different for where's
(treated as one word) and ma'am
(treated as two words) (you can see the code here, there is more variations of this logic.).
So with the test sentence "Where's the leak ma'am?", we have:
["Where's", "the", "leak", "ma", "am"]
["Where", "'", "s", "the", "leak", "ma", "'", "am"]
["Where's", "the", "leak", "ma'am"]
as you can see these are three different values.I think I would be in favour of removing all special characters, but still count them as a word separator - i.e. go for:
["Where", "s", "the", "leak", "ma", "am"]
.
also - do we agree this is a breaking change that should wait for a v2?
if yes, we might want to think about this very carefully, because we don't a new major versions to often, so we might want to think about other breaking changes (like lodash remove all special characters like %
, &
, ... and we currently don't).
I would like to challenge this. Lodash approach is more complex than just "removing apostrophes before splitting words".
_.camelCase("Where's the leak ma'am")
// "wheresTheLeakMaam"
_.snakeCase("Where's the leak ma'am")
// "wheres_the_leak_maam"
I guess they do the apostrophe removal at the casing level, before it is passed into words.
We could move the removal of apostrophes to the casing function like they do here. This might be nice for a couple of reasons:
Words
and the casing functions would align with lodashCan you think of scenarios where words should be separated based on an apostrophe?
ma'am
should definitely be one wordballon d'or
-- "golden" should be one word, right?Either way, SnakeCase<"ma'am"> // ma_'_am
is definitely the wrong answer
Yes, I agree with @p9f that we should think carefully about (any) change in the words method. I like @jly36963 proposal from his latest comment.
If we just "fixed" machine readable casing functions, would that be a major version?
I'm also a little afraid of going down the "perfect grammar" route bc it can lead to a much more complex code. Wdyt?
Based on this issue, it might be worth:
'
at the words level
["Where's", "the", "leak", "ma'am"]
'
isn't treated as a new word'
at the (machine-readable) casing level, before the input is passed into Words
wheres_the_leak_maam
It does makes sense. I tend to adopt the same positioning of J Dalton there: limiting the scope to avoid creating a lil monster, but with that said if no one can come up with a con for that we can add that behavior. AFAICT in both English and Portuguese it will work.. not sure about other languages
@jly36963 ballon d'or -- "golden" should be one word, right?
fwiw in french I think "ballon", "d", "or"
would be better since d
is the contraction of de
, a different word from or
(kinda ball [made] o[f] gold
).
I don't think that really matter though - surely French is out of scope.
I don't see any downside in your latest solution right now, even though I think we should take our time to consider all the implications. It seems the best of both world.
What do you think of, in the same release, changing the logic for special character so that they don't create artificial word? i.e. converting hello, world
into hello
, world
instead of the current hello
, ,
, world
(and this for all character except letters, digits and single quote.
I'm not too sure what to do with the different special characters scenarios. It's probably beyond the scope of this PR.
Scenarios:
abc,def
-> abc
def
best-f$%^ing-birria
-> best
f$%^ing
birria
cão
-> cão
abc123%^&
-> (same)
I've been thinking of a future solution, maybe something like:
!@#$%^&
...)I could put this in a github discussion
I'm not too sure what to do with the different special characters scenarios. It's probably beyond the scope of this PR. Keep in mind that supporting international characters is out of the scope of the library at all.
@gustavoguichard I'm okay either way.
I think a majority of people would expect the updated behavior (eg: wheres
over where_'_s
), but there's always a chance that someone is relying on the current behavior.
@jly36963 I'd do a major change ;)
Words and Apostophes
Words
and related casing utilities have weird behavior for words containing'
. I updated the logic to be more like lodash's (removing apostrophes before splitting words)Relates to, but is not a fix for: https://github.com/gustavoguichard/string-ts/issues/122
Words
Casing