Closed derekmahar closed 2 years ago
I verified that REGEXP_REPLACE
in csvq v1.14.1-pr.1 does not support substitution operator \u
:
derek@derek-TB350-BTC:~/Downloads/csvq-v1.14.1-pr.1-linux-amd64$ ./csvq "SELECT REGEXP_REPLACE('abc abc', '(\w+)', '\u\$1')"
+----------------------------------------------+
| REGEXP_REPLACE('abc abc', '(\\w+)', '\\u$1') |
+----------------------------------------------+
| \uabc \uabc |
+----------------------------------------------+
I'm sorry if you feel this is rude, but I have a question. Are the features you suggested what you actually needed in some situations? Or are you aiming for the perfect tool?
Csvq is not originally designed to do very complex processing, and there are usually other good solutions for such things. There is no need to try to do everything with one tool when there are many other great tools in the world.
By the way, you can get the results with the current features of csvq. For example, you can create a user-defined function to get the result you want.
Declare the user-defined function in a file named "capitalize.sql".
DECLARE CAPITALIZE FUNCTION (@str) AS BEGIN
DECLARE @result := '';
DECLARE @list := REGEXP_FIND_ALL(@str, '\w+');
DECLARE @i := 0;
WHILE TRUE
DO
DECLARE @item := JSON_VALUE('[' || @i || '][0]', @list);
IF @item IS NULL THEN
BREAK;
END IF;
IF 0 < LEN(@item) THEN
IF 0 < LEN(@result) THEN
@result := @result || ' ';
END IF;
@result := @result || UPPER(SUBSTRING(@item FROM 1 FOR 1));
END IF;
IF 1 < LEN(@item) THEN
@result := @result || SUBSTRING(@item FROM 2);
END IF;
@i := @i + 1;
END WHILE;
RETURN @result;
END;
And run a sql with the SOURCE command.
$ csvq "SOURCE `capitalize.sql`; SELECT CAPITALIZE('capitalize initials of all names')"
+------------------------------------------------+
| CAPITALIZE('capitalize initials of all names') |
+------------------------------------------------+
| Capitalize Initials Of All Names |
+------------------------------------------------+
Thank you for this solution. I don't think your response was rude at all.
I do have a specific purpose. I want to capitalize the initials of names in a column that contains full names in mixed case. ;-) I think this is not such a complicated problem and so I prefer to use csvq than resort to another tool or a lower level programming language like Rust, Java, or heaven forbid, Python.
I've yet to use user defined functions in csvq and I had forgotten that csvq supported them. I have read several sections of the csvq documentation many times, but this time I somehow overlooked the section on user defined functions. I sometimes confuse Miller and csvq's features and so mistakenly thought that Miller had user defined functions, but csvq did not.
By the way, for what it's worth, as far as I could tell, SQLite string functions can't do this, either, and SQLite doesn't support user defined functions except those in the form of loadable modules implemented in C.
Is JSON_VALUE
the only function in csvq which can operate on an array of strings?
You can also use the function LIST_ELEM.
$ csvq "SELECT LIST_ELEM('word1 word2', ' ', 1)"
+----------------------------------+
| LIST_ELEM('word1 word2', ' ', 1) |
+----------------------------------+
| word2 |
+----------------------------------+
I'd like to implement composite types of Array and Map, and syntax and functions to handle them, but it may take a while.
I'll be patient, then.
Thanks for your response.
If you use user-defined functions frequently, you can define them in the $HOME/.csvqrc
so that you don't need to run the SOURCE command every time. (See: https://mithrandie.github.io/csvq/reference/command.html#configurations)
The function to capitalize will be provided in the future, but regular expressions will remain as they are now.
Thank you. I was just thinking about how to load user defined functions.
I'm satisfied with the user defined CAPITALIZE
function.
I implemented user defined function CAPITALIZE_WORDS
that uses function LIST_ELEM
instead of JSON_VALUE
:
$ cat csvqrc
DECLARE CAPITALIZE_WORDS FUNCTION (@words) AS
BEGIN
DECLARE @result := CAPITALIZE(GET_WORD(@words, 1));
DECLARE @word := '';
DECLARE @word_count := 2;
WHILE (@word := GET_WORD(@words, @word_count)) IS NOT NULL
DO
@result := @result || ' ' || CAPITALIZE(@word);
@word_count := @word_count + 1;
END WHILE;
RETURN @result;
END;
DECLARE CAPITALIZE FUNCTION (@word) AS
BEGIN
IF @word = '' THEN
RETURN '';
ELSE
RETURN UPPER(SUBSTR(@word, 0, 1))
|| LOWER(IFNULL(SUBSTR(@word, 1), ''));
END IF;
END;
DECLARE GET_WORD FUNCTION (@words, @index) AS
BEGIN
RETURN LIST_ELEM(@words, ' ', @index - 1);
END;
$ csvq "SELECT '\'' || CAPITALIZE_WORDS(' abc def ') || '\'';"
+--------------------------------------------------+
| '\'' || CAPITALIZE_WORDS(' abc def ') || '\'' |
+--------------------------------------------------+
| ' Abc Def ' |
+--------------------------------------------------+
$ csvq "SELECT '\'' || CAPITALIZE_WORDS('abc def') || '\'';"
+---------------------------------------------+
| '\'' || CAPITALIZE_WORDS('abc def') || '\'' |
+---------------------------------------------+
| 'Abc Def' |
+---------------------------------------------+
$ csvq "SELECT '\'' || CAPITALIZE_WORDS('abc') || '\'';"
+-----------------------------------------+
| '\'' || CAPITALIZE_WORDS('abc') || '\'' |
+-----------------------------------------+
| 'Abc' |
+-----------------------------------------+
$ csvq "SELECT '\'' || CAPITALIZE_WORDS('') || '\'';"
+--------------------------------------+
| '\'' || CAPITALIZE_WORDS('') || '\'' |
+--------------------------------------+
| '' |
+--------------------------------------+
$ csvq "SELECT '\'' || CAPITALIZE_WORDS(NULL) || '\'';"
+----------------------------------------+
| '\'' || CAPITALIZE_WORDS(NULL) || '\'' |
+----------------------------------------+
| NULL |
+----------------------------------------+
TITLE_CASE function has been included in version 1.17.0.
Given a string that contains multiple lower case names (words), how might I capitalize only the initial letter of these names?
I think that the existing set of string functions in csvq cannot perform this operation. However, this can be done by applying a PCRE2 regular expression substitution which converts all initials to upper case.
Unfortunately, according to Regular Expressions 101, Golang does not support the PCRE2 substitution operator
\u
, so I presume that the new regular expression functionsREGEXP_REPLACE
in csvq v1.14.1-pr.1 likewise also cannot perform this operation. If this is the case, would you consider implementing a specific string function or generic set of string functions to capitalize all words in a string?