trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.05k stars 2.9k forks source link

Add function to convert strings to title case. #2942

Open fredabood opened 4 years ago

fredabood commented 4 years ago

According to the Presto docs, there is not currently a function to convert a string to title case.

mosabua commented 4 years ago

The typical standard name of such a function is initcap(string) from what i can tell.

electrum commented 4 years ago

This can be done using regex_replace. There's an example in the documentation: https://prestosql.io/docs/current/functions/regexp.html


SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2]));
-- 'New York'
mosabua commented 4 years ago

Thats a good workaround. Imho it would be good to have an actual function initcap or so.

tomwagstaff-fc commented 3 years ago

This can be done using regex_replace. There's an example in the documentation: https://prestosql.io/docs/current/functions/regexp.html

SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2]));
-- 'New York'

We're stuck with Presto 0.172 so this isn't supported. And this (my best guess for an equivalent):

SELECT regexp_replace('new york', '(\w)(\w*)', UPPER('$1') || LOWER('$2'))

just yields the original string in lower case. Is there any way to get Title Case in this version of Presto?

ViniciusARZ commented 1 year ago

Anyone was able to use this? I tried multiple regex rules and couldn't make it work for a string column that contains single words and multiple words in the same index...

mosabua commented 1 year ago

I just tried this again to verify. The regex from @electrum works perfectly for single word values, multiple words, and full sentences. Note that I used Trino 403.

trino
trino> SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2]));
  _col0
----------
 New York
(1 row)

Query 20230130_155829_00000_wpd2m, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.80 [0 rows, 0B] [0 rows/s, 0B/s]

trino> SELECT regexp_replace('new york is the greatest', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2]));
          _col0
--------------------------
 New York Is The Greatest
(1 row)

Query 20230130_155843_00001_wpd2m, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.19 [0 rows, 0B] [0 rows/s, 0B/s]

trino> SELECT regexp_replace('new', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2]));
 _col0
-------
 New
(1 row)

Query 20230130_155854_00002_wpd2m, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.17 [0 rows, 0B] [0 rows/s, 0B/s]

However the one thing about implementing a function I would warn about is that this is NOT true title case as it would have to be used following English grammar rules. The function converts to initial capital letters on ALL words. Title case does NOT do that .. short words like "is", "on" ... and many others would need to stay lower case for true title case formatting. So if anyone implements this as a function it should be called initcap .. NOT titlecase

gabrielcardoso1 commented 2 months ago

I ran into a problem using this workaround on Portuguese words with accents, where the letter after an accent is also capitalized:

e.g.: EscritóRio, PréDio, CondomíNio

sebastiengiroux commented 2 months ago

I ran into a problem using this workaround on Portuguese words with accents, where the letter after an accent is also capitalized:

e.g.: EscritóRio, PréDio, CondomíNio

Try regexp_replace('new york', '(\p{L})(\p{L}*)', x -> upper(x[1]) || lower(x[2])) instead.

\p{L} matches any kind of letter from any language.

gabrielcardoso1 commented 2 months ago

I ran into a problem using this workaround on Portuguese words with accents, where the letter after an accent is also capitalized: e.g.: EscritóRio, PréDio, CondomíNio

Try regexp_replace('new york', '(\p{L})(\p{L}*)', x -> upper(x[1]) || lower(x[2])) instead.

\p{L} matches any kind of letter from any language.

Perfect, this method works perfectly, thank you!