photoprism / photoprism

AI-Powered Photos App for the Decentralized Web 🌈💎✨
https://www.photoprism.app
Other
35.4k stars 1.97k forks source link

Settings: Add options to configure title capitalization #2672

Open jelmer opened 2 years ago

jelmer commented 2 years ago

1. What is not working as documented?

Photoprism automatically capitalises the first letter in every part of a name. In Dutch, this is incorrect as prepositions are spelled with a lowercase first letter. See https://en.wikipedia.org/wiki/Van_(Dutch)#Collation_and_capitalisation

2. How can we reproduce it?

Steps to reproduce the behavior:

  1. Go to 'People'
  2. Click on any face
  3. Enter a name, e.g. "Koos van Driemond"
  4. Notice that Photoprism changes the name to "Koos Van Driemond"

3. What behavior do you expect?

For photoprism to preserve the text as it was entered - or possibly for it to correctly adjust the casing (though that is quite hard with international names).

6. Which software versions do you use?

I'm running photoprism Build 220901-f493607b0, in kubernetes using mariadb.

lastzero commented 2 years ago

Yes, we didn't implement capitalization rules for every language - especially since we don't speak Dutch. I would consider this an improvement, not a "bug", since we never said it would work in every language. Of course, you can contribute with a pull request to add these rules. You might just need to extend the existing implementation a bit.

jelmer commented 2 years ago

I don't think this is the sort of thing that can be automated - names are very different across the world. Belgians tend to uppercase "van", Americans tend to join the preposition with the surname ("Vandriemond"). It's impossible to guess what nationality somebody is based on just their name (and thus what the correct formatting is). Other online services also don't do any mangling - precisely for this reason I suspect.

Would you consider a PR that just dropped the automatic capitalization altogether?

lastzero commented 2 years ago
jelmer commented 2 years ago

This isn't a language issue - it's about preserving people's preferred spelling of their name. There are some common practices that are related to nationality, but it's specific to each person. Capitalizing titles for other things based on the locale seems perfectly reasonable to me.

lastzero commented 2 years ago

If you only want "van" in lowercase, it's done quickly! There is a list of such "small words" for exactly this purpose. At the moment, we don't have time to discuss the subject in more depth though.

lastzero commented 2 years ago

Source Code: https://github.com/photoprism/photoprism/blob/develop/pkg/txt/smallwords.go

You should be able to edit this directly on GitHub and then send a pull request!

jelmer commented 2 years ago

That's my point though - it's nothing to do with my preference (and thus a configuration option wouldn't help here) but specific to the person who is in the photo. I have photos of a Belgian friend who uses "van" with a capital V and of various Dutch people with surnames that include "van" with a lowercase v.

lastzero commented 2 years ago

If perfection is the only acceptable goal, this will have to wait. I think that the proposed small change would be worthwhile if it would then be "correct" in "most" cases. Even within the same language (e.g. English) there are different rules. We are not linguists.

Wilm0r commented 2 years ago

I am probably one of those friends and I can confirm that writing my name as Wilmer Van Der Gaast is pretty much offensive and not just cosmetic. It is not any less wrong (not kidding here) than writing your name as mICHAEL mAYER.

lastzero commented 2 years ago

@Wilm0r My suggestion above to add "van" to our list of short words that must not be capitalized would solve this within seconds.

jelmer commented 2 years ago

Hi Michael,

Sorry for turning this into a long discussion; I do appreciate your work on photoprism, and didn't mean for this to be a polarizing distraction. That said, I don't think I've managed to get the original point across and feel strongly about this, so I'll make one last ditch attempt and then I'll shut up. :)

Trying to automatically improve the formatting of text in general (including capitalization) seems helpful to me; I'm thinking of e.g. converting the label "trees and plants" to "Trees and Plants". Although, as you say, differences in locales can make this tricky to get right across the board - and I don't think anybody expects you to get this perfect from the get go.

I'm not suggesting photoprism should get rid of all capitalization or automatic formatting, but names should be exempt - precisely because it's impossible to programmatically determine what the correct is (see also e.g. https://dev.to/carlymho/whats-in-a-name-validation-4b41). The formatting of a name is not down to the strict rules of linguistics, but ultimately to personal preference of the person whose name it is. The examples I listed were meant to help illustrate the fact it's impossible to get right, rather than suggest a more complicated implementation. Other online services just use whatever the user entered verbatim for exactly this reason.

I'm still keen to somehow address this for my personal photoprism, ideally in a form that can make its way into upstream. Options I can think of (in order most to least preferred):

  1. keep capitalization, but disable for labels in "People"
  2. add a configuration option to disable capitalization for labels in "People"
  3. add a configuration option to disable capitalization across the board

(I'd ideally like to avoid having to set a configuration option because it's nicer that things work out of the box)

Would any of these also be acceptable to you?

graciousgrey commented 2 years ago

@jelmer unfortunately neither option is a quick win, so we can't address one of them right now. But we can keep this issue open for a later time :)

lastzero commented 2 years ago

@jelmer I totally forgot to reply because we were super busy and then went on vacation.... sorry for that!

lastzero commented 2 years ago

For now, I've added "van" to the list of words that are normally written in lower case, just like the German "von".

systemmonkey42 commented 2 years ago

Please add the french 'le' as lowercase., as long as it is used as a joiner.

In addition I have names where the surname is "from", which I think is treated as a joiner and subsequently converted to lowercase. This results in the strangly looking 'Michael from' as the name.

I tried changing the capitalisation in the database, but it put automatically the old strangeness back.

To be fair, I think the greated improvement would be to stop messing with names, and accept them as typed. (Or at least make it an option I can disable permanently)

Wilm0r commented 2 years ago

Yeah, and even 'le' is difficult since it's a somewhat common Asian last name AFAIK. (Indeed doing it only when used as a joiner will help, but there shall be more corner cases, etc.)

Capitalizing titles is also necessary because PhotoPrism MAY extract a lot of text from sources that generally do NOT use capitalization, such as the filenames exported by hosted photo apps when you ask them to (required by law, so they often don't provide any other useful metadata...).

That's unfortunate :( Wonder whether heuristics (do it only when the full name is in the same case) could be a nice middle ground. Preserving case where present really would be preferable over trying to guess it?

systemmonkey42 commented 2 years ago

Preserving case where present really would be preferable over trying to guess it?

Call it "Smart Case" mode... Similar to how many editors assume searching for "text" is case-insensitive, but searching for "Text" is case sensitive...

lastzero commented 2 years ago

It would be great if other software had such problems.... I'll take that as a compliment. Maybe a contributor has some time to experiment and come up with a solution (that ideally doesn't require special settings). I don't know when else we'll get to it. Adding "le" should be quick though. Feel free to send a pull request for the small words list in our txt package in case I don't remember tomorrow.

kvalev commented 2 years ago

How would the latest commit work with names such as "La Réunion"?

lastzero commented 2 years ago

Same as in English where small words at the beginning are capitalized as well.