okunishinishi / python-stringcase

String case converter for python.
https://pypi.python.org/pypi/stringcase
MIT License
206 stars 36 forks source link

snakecase("HTTPResponse") produces "h_t_t_p_response" #4

Open lepsch opened 7 years ago

lepsch commented 7 years ago

I think acronyms should be converted to one "word" only, eg. HTTPResponse should be converted to http_response.

kenodegard commented 7 years ago

While that may make perfect sense to us humans this is a very hard problem to solve without including a large acronym lookup dictionary for all the different acronyms that should be treated differently from the rules.

Without a lookup dictionary there is absolutely no difference between "AAAAAaaaaaaa" and "HTTPResponse".

pykong commented 7 years ago

@njalerikson @okunishinishi

is a very hard problem to solve without including a large acronym lookup dictionary for all the different acronyms

It should not be difficult to provide such a lookup dictionary, by just adding a sensible collection of common acronyms (embedded in code) to the library. Such not be too difficult to find. Still, the better solution should be to let the user pass his own list of acronyms.

More intricate is the actual implementation. Certainly, can be done via regex and groups. Maybe stringcase could be restructured as a class with the current functions as (class)methods. A list of strings could be passed to the init on instantiation._

A non-regex approach for finding acronyms in the context of string case conversions can be found here: https://github.com/jdc0589/CaseConversion/blob/master/case_parse.py

Essentially this method returns a list of words in PascalCase. These words can then be combined to give various cases. It should be easy to implement. The method either takes a list of strings as predefined acronyms (e.g. ["HTTP", "FTP"]) and if no such list is given has fallback method. This fallback method is not working with regex, as the one in the comment below. In case @okunishinishi wants to extend stringcase it is better to replace that fallback method with a pure regex approach.

pykong commented 7 years ago

Here is a pure regex approach, which does not mince runs of uppercase word in the fashion, as described. It also does not rely on a lookup dict. It would be trivial to implement.

https://stackoverflow.com/questions/1175208/elegant-python-function-to-convert-camelcase-to-snake-case

pykong commented 6 years ago

This new package offers acronyme detection. https://github.com/AlejandroFrias/case-conversion