Closed igorburago closed 4 years ago
If we choose this PR over #59, we might want to rename
create_wordlist_filter()
to create_wordlist_filter_from_file()
(or something similar), and allow the user to import it to
simplify creating the corresponding callback function.
But we can even keep that function private, of course, unless we
expect a great number of users who are calling the titlecase()
function directly—rather than using the command line tool—to load
their abbreviations lists from files. In that case, we would save the
user some time and effort in reimplementing reading the list from file
and creating a callback.
Here I keep this function importable, but not added to __all__
.
As a side note, I am not sure it is a great idea to have that
logger.debug()
call in the titlecase()
. When the titlecase()
is called frequently, it might have a small performance cost.
However tiny it is, it still might be undesirable for such a small
library function like titlecase()
.
I would remove that logging call or, at least, would protect it
with if __debug__:
.
I am also not sure that we really need any logging in
create_wordlist_filter
, but I’m fine with keeping it there
(assuming there are reasons for that).
I don't believe that this library targets high-performance use cases. Moreover, one old programmer's wisdom: Only optimize if you need to optimize! As long as there is no urge, it might be a waste of time and it might clutter the code. I could be persuaded by proper benchmarks though. In addition, have you checked the logging library whether they maybe already have something similar in place? I guess they did optimize their code.
To be honest, for my use case I only care about the command line interface so the internals do not bother me so much as long as nothing breaks, there are tests, and everything is nicely documented.
Regarding logging:
If Python is started with the -O
option, the code protected by
if __debug__:
is optimized out when the code is loaded into
the interpreter and, therefore, has no performance cost during
interpretation later on.
The logger’s debug()
method in the logging
module does check
whether the debug level of logging is enabled, and does not log the
given message otherwise. But one still pays the (however small)
performance cost of the function call and the log-level check.
It is unfortunate that no benchmarks were made when this logging code was added. It is adding code that may clutter, not removing.
I do not have time to do benchmarks at the moment, so, as I said previously, I would be all right with keeping the logging code for now, as long as we merge this PR (which makes no modifications to the logging code).
(I brought up logging as a side node, just out of curiosity.)
As to the subject of this PR, your interest is not affected, then, since:
The command-line interface remains unchanged with the command
having exactly the same default behavior of reading the ~/.titlecase.txt
abbreviations file as before.
Those who are using the package as a library instead (by calling
the titlecase()
function directly), still have the ability to protect
abbreviations loaded from a file by specifying an appropriate callback.
The create_wordlist_filter_from_file()
function provides an easy
way for the user to make a callback with that same original functionality,
so that calls like
titlecase(str, wordlist_filter=file_path)
simply change to
titlecase(str, callback=create_wordlist_filter_from_file(file_path))
The test for the wordlist-based filtering in the tests.py
file
was modified to use the new approach (just like shown above).
The tests are all passed.
Although I have touched on the advantages of this PR in #58, let me reiterate on them here a bit more elaborately.
Separation of concerns. The titlecase()
function is kept to its
main purpose of title-casing a given string, and is not burdened by
any particular abbreviation-list mechanism being embedded into it.
Simplification of the API without loss of functionality. The
titlecase()
function still has (the original) mechanism for
supporting any exceptions in capitalization via the callback
parameter. That is, it uses a single interface for all kinds of custom
words, regardless of wether they were loaded from a file or populated
any other way.
Abbreviation file parsing is still available through a callback.
The create_wordlist_filter_from_file()
function is kept around
to support the same use cases (with the primary instance being
the command implementation in cmd()
). The only change is that
the call to create_wordlist_filter_from_file()
now happens
outside of titlecase()
, instead.
Flexibility in composing multiple capitalization-exception sources.
When all custom capitalization code is shaped into those callback
functions, the user can easily compose them in any order, while
currently callback
is always called before wordlist_filter
.
That sounds great to me!
This makes a lot of sense to me. I was probably a little quick to move on pushing out the prior iteration of this, it had just sat for a while so I felt bad.
Strictly speaking, I think this will also require a 2.0 release as it changes the interface; not that I think a ton of folks have pulled the 1.0 version yet, but I would like to try to be a good citizen of the semver universe.
Changing the major version number due to the change of the API makes total sense.
However, unless you have to, could you please postpone making a new release? I have another couple of pull requests in the works for cleaning up the regular-expression constants, for making it easier for the user to provide the list of custom small words, and, possibly, for doing some minor refactoring.
No urgency to release here. Excited to see the new PRs!
If the
titlecase()
function is called directly (say, from a library), the caller can still use a custom list of abbreviations by passing an appropriate function ascallback
.Fixes #58.