ppannuto / python-titlecase

Python library to capitalize strings as specified by the New York Times Manual of Style
MIT License
254 stars 38 forks source link

Additional titlecasing amendments #96

Open robinwhittleton opened 11 months ago

robinwhittleton commented 11 months ago

The original plan was to lift the regexes directly, but I’d forgotten that Standard Ebooks is a GPL3 codebase, and here is MIT. Obviously we can’t copy everything directly over, so the new plan is that I’ll copy over my original contributions, and anything that anyone else agrees should be contributed.


At Standard Ebooks we use python-titlecase to format a bunch of stuff throughout our productions (thanks!) but we also have some additional rules and changes to meet our specific needs. These start at [redacted]; the comments as a list give you a good overview:

Would any of these be things that python-titlecase are interested in? I’d be happy to upstream them as PRs.

MinchinWeb commented 11 months ago

Personally, I think these would all be great additions!

Regarding À La Carte, should the la (litterally, "the") also be lowercase? so as à la Carte?

Should that also be extended to lowercase au (à + le) and aux (à + les) as well? (these are the masculine and plural forms of à la, which is feminine).

Regarding "Lowercase de, von, van, le, du" -- should this list be extended to des (de + les; du is de + le; de la is written out, all meainng "of the")? and also les and la (the plural and feminine forms of le, meaning "the")? (I realize la is sometimes used as a music note, so it's inclusion may cause more false positives than is helpful.)

ppannuto commented 8 months ago

These all look reasonable to me, and happy to take a PR (or possibly better several PRs as these look like a large number of rules?).

My biggest ask would be to make sure that each new rule adds a test case or two which demonstrates (and validates) when it is and is not supposed to trigger, and that it's operating correctly. Should be easy to just add a phrase-per-rule ish to the tests.py.

robinwhittleton commented 8 months ago

So, something I forgot was that Standard Ebooks’ tooling is GPL3 which isn’t compatible with MIT. That makes the way forwards a little difficult and comes down to a couple of options.

  1. I could check which of them I added and leave it at that. Potentially I could check in with other contributors to see if they’d be happy having their contributions reused in an MIT codebase. But I’ve checked with one of the bigger contributors and they’re not.
  2. Alternatively I could leave the list here, but remove the link. Then other people could do a cleanroom implementation of the functionality without reference to a GPL3 codebase.

Sorry about that, it honestly didn’t cross my mind until I sat down to actually implement it.

ppannuto commented 7 months ago

:/, that's unfortunate. I definitely can't do any kind of license change on this end in good conscience. I'm just a steward really of a project which has had many owners over the years.

I guess the best path forward is to pull over any changes you can, and then leave this open with the link removed as you suggested. It's a great todo list for anyone looking for some simple OSS contributions at least.

robinwhittleton commented 7 months ago

OK, I’ll try to get around to this at some point over the next week.

robinwhittleton commented 6 months ago

I reviewed through blame who’d contributed which rules, and it turns out that all but two were written by a contributor who would (reasonably of course) rather their code remains GPL-3 rather than MIT. The other two were written by me, but are not useful in the more general context.

So I think I’ve done as much as I can here. I know the original code so I don’t want to attempt a black-box reimplementation as MIT. If anyone else who hasn’t read the GPL3 code wants to take this list as the starting point for python-titlecase improvements then go for it, but otherwise let’s close this issue.

Thanks for the time anyway, and sorry that I hadn’t been more careful about licensing when I proposed this.