Open robinwhittleton opened 11 months ago
Personally, I think these would all be great additions!
Regarding À La Carte
, should the la
(litterally, "the") also be lowercase? so as à la Carte
?
Should that also be extended to lowercase au
(à
+ le
) and aux
(à
+ les
) as well? (these are the masculine and plural forms of à la
, which is feminine).
Regarding "Lowercase de
, von
, van
, le
, du
" -- should this list be extended to des
(de
+ les
; du
is de
+ le
; de la
is written out, all meainng "of the")? and also les
and la
(the plural and feminine forms of le
, meaning "the")? (I realize la
is sometimes used as a music note, so it's inclusion may cause more false positives than is helpful.)
These all look reasonable to me, and happy to take a PR (or possibly better several PRs as these look like a large number of rules?).
My biggest ask would be to make sure that each new rule adds a test case or two which demonstrates (and validates) when it is and is not supposed to trigger, and that it's operating correctly. Should be easy to just add a phrase-per-rule ish to the tests.py
.
So, something I forgot was that Standard Ebooks’ tooling is GPL3 which isn’t compatible with MIT. That makes the way forwards a little difficult and comes down to a couple of options.
Sorry about that, it honestly didn’t cross my mind until I sat down to actually implement it.
:/, that's unfortunate. I definitely can't do any kind of license change on this end in good conscience. I'm just a steward really of a project which has had many owners over the years.
I guess the best path forward is to pull over any changes you can, and then leave this open with the link removed as you suggested. It's a great todo list for anyone looking for some simple OSS contributions at least.
OK, I’ll try to get around to this at some point over the next week.
I reviewed through blame who’d contributed which rules, and it turns out that all but two were written by a contributor who would (reasonably of course) rather their code remains GPL-3 rather than MIT. The other two were written by me, but are not useful in the more general context.
So I think I’ve done as much as I can here. I know the original code so I don’t want to attempt a black-box reimplementation as MIT. If anyone else who hasn’t read the GPL3 code wants to take this list as the starting point for python-titlecase improvements then go for it, but otherwise let’s close this issue.
Thanks for the time anyway, and sorry that I hadn’t been more careful about licensing when I proposed this.
The original plan was to lift the regexes directly, but I’d forgotten that Standard Ebooks is a GPL3 codebase, and here is MIT. Obviously we can’t copy everything directly over, so the new plan is that I’ll copy over my original contributions, and anything that anyone else agrees should be contributed.
At Standard Ebooks we use python-titlecase to format a bunch of stuff throughout our productions (thanks!) but we also have some additional rules and changes to meet our specific needs. These start at [redacted]; the comments as a list give you a good overview:
MIX
(which is much more likely to be an English word than a Roman numeral) orDI
which may be an Italian wordand
,or
even if preceded by punctuationand
, if it's not the very first word, and not preceded by an em-dashthe
, if preceded by a dash (likePuss-in-Boots
orJack-in-the-Box
)th’
, sometimes used poeticallyo’
to-night
(which might appear in poetry)from
,with
, as long as they're not the first word and not preceded by a parenthesisCapitalise the first word after an opening quote or italicisation that signifies a workthis relies on SE specific markupthe
if preceded byvs.
de
,von
,van
,le
,du
as inCharles de Gaulle
,Werner von Braun
, etc., and if not the first word and not preceded by an “Or,
, since it is probably a subtitle:
, exceptor,
, which indicates a kind of subtitleO'Keefe
orL'Affaire
. But only if there's at least 3 letters after, to prevent catching things likeI'm
orE're
Mc
by
d’
, as inMarie d’Elle
l’
as inl’Affaire
, but not if it's a the first letterA-
as inA-Breaking
À
(as inÀ La Carte
) unless it's the first wordmm
(millimeters, as in50 mm gun
) unless it's followed by a period in which case it's likelyMm.
(Monsieurs)al-
(as in the Arabic definite article) unless it’s the first wordWould any of these be things that python-titlecase are interested in? I’d be happy to upstream them as PRs.