lidatong / dataclasses-json

Easily serialize Data Classes to and from JSON
MIT License
1.34k stars 150 forks source link

Fix snakecase converter for strings with multi-digit numbers #519

Open jesp1999 opened 4 months ago

jesp1999 commented 4 months ago

Resolves https://github.com/lidatong/dataclasses-json/issues/518

Change to the snake_case letter converter to give more consistent behavior for strings containing multiple numeric characters in sequence. For instance:

>>print(old_snakecase('Alice'), new_snakecase('Alice'))
alice alice

>>print(old_snakecase('Alice123'), new_snakecase('Alice123'))
alice_1_2_3 alice123

>>print(old_snakecase('AliceBob123'), new_snakecase('AliceBob123'))
alice_bob_1_2_3 alice_bob123

Edit 1

@USSX-Hares: Updated description formatting.

USSX-Hares commented 4 months ago

Wait, there are no tests for this?

@jesp1999, would you mind writing units for the stringcase.py? This will prevent issues like this in the future.

george-zubrienko commented 4 months ago

I agree with @USSX-Hares. Quick search reveals there is no standard for snake case, so this subject is rather tricky to discuss. Also considering this is not backwards-compat, we'll have to bump major for this change to be released. I can suggest adding a new case converter instead - much smoother and can be used w/o breaking stuff for other people.

jesp1999 commented 4 months ago

Thanks for following up with all of the useful feedback, apologies if I'm violating any sort of convention by submitting a code change PR without a discussion first -- I'm new to open source.

I'll implement some tests soon to rigidly define the behavior of the casing in the scenarios you present, as well as any more I can think of.

Continuing the discussion, @george-zubrienko do you or anyone else have any suggestions for an appropriate name for this modified snake casing style based on the current conventions in the codebase?

USSX-Hares commented 4 months ago

I agree with @USSX-Hares. Quick search reveals there is no standard for snake case, so this subject is rather tricky to discuss. Also considering this is not backwards-compat, we'll have to bump major for this change to be released. I can suggest adding a new case converter instead - much smoother and can be used w/o breaking stuff for other people.

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version. Also, I don't know any person who wants alice_1_2_3 naming (especially when converting from alice123) and thus I believe this will not be a breaking change (yeah, I know the story behind jQuery v4.0.0, but, let's be honest, we are both not jQuery and not v3.x).

george-zubrienko commented 4 months ago

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version.

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

WDYT? @USSX-Hares @jesp1999

george-zubrienko commented 4 months ago

Thanks for following up with all of the useful feedback, apologies if I'm violating any sort of convention by submitting a code change PR without a discussion first -- I'm new to open source.

No need to apologize! It is always good to open a PR, because that shows your ideas best. Rest is technical stuff :)

USSX-Hares commented 4 months ago

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

Well, I just said my opinion on the topic -- however, I don't go against your proposal.

The only thing I am concired is the name of converter you have suggested, it's a bit confusing. It took me some time that SNAKE_V1 is not the old syntax. Usually, when something has v1 in its name when there are alternatives, v1 is the oldest. How about smth like SNAKE_FUTURE, SNAKE_V2, or SNAKE_NEW? Any other variants and/or suggestions are welcomed.

jesp1999 commented 4 months ago

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version.

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

WDYT? @USSX-Hares @jesp1999

My primary concern with this approach is that this will force those who are watching library updates to either

I feel like the first option is an undesirable one for most consumers, including myself, as it leaves a lingering threat of deprecation over my head whether I choose to comply with the change now or later. I'm still trying to think if I have a better solution, though...