vindarel / cl-str

Modern, simple and consistent Common Lisp string manipulation library.
https://vindarel.github.io/cl-str/
MIT License
305 stars 37 forks source link
common-lisp quicklisp string-manipulation

Quicklisp

A modern and consistent Common Lisp string manipulation library

(ql:quickload "str")

also on Ultralisp.

Why ?

The only dependency is cl-ppcre.

Table of Contents

Install

Install with Quicklisp:

(ql:quickload :str)

Add it in your .asd's project dependencies, and call functions with the str prefix. It is not recommended to :use :str in a package. It's safer to use the str prefix.

Check its version:

(str:version)

To get a newer version, you need to update the Quicklisp dist (think of QL as Debian's apt rather than pip/npm/etc):

(ql:update-dist "quicklisp")

Don't have a full Common Lisp development environment yet ? Get Portacle, a portable and multiplatform development environment shipping Emacs, Quicklisp, SBCL and Git. See also editor support (Vim, Lem, Atom, Eclipse,…).

Global parameters

Some parameters are common to various functions and often used: :ignore-case and :omit-nulls.

Consequently we can also manage them with global parameters:

(let ((str:*ignore-case* t))
  (str:ends-with-p "BAR" "foobar"))

is equivalent to

(str:ends-with-p "BAR" "foobar" :ignore-case t)

Functions

Tweak whitespace

trim (s &key (char-bag *whitespaces*))

Removes all characters in char-bag (default: whitespaces) at the beginning and end of s. If supplied, char-bag has to be a sequence (e.g. string or list of characters).

(str:trim "  rst  ") ;; => "rst"
(str:trim "+-*foo-bar*-+" :char-bag "+-*") => "foo-bar"
(str:trim "afood" :char-bag (concat "a" "d")) => "foo""
(str:trim "cdoooh" :char-bag (str:concat "c" "d" "h")) => "ooo"

Also trim-left and trim-right.

Uses the built-in string-trim where whitespaces are '(#\Space #\Newline #\Backspace #\Tab #\Linefeed #\Page #\Return #\Rubout).

collapse-whitespaces (s)

Ensure there is only one space character between words. Remove newlines.

(str:collapse-whitespaces "foo  bar

  baz")
;; "foo bar baz"
;;T

To longer strings

join (separator list-of-strings)

Join strings in list list-of-strings with separator (either a string or a char) in between.

(join " " '("foo" "bar" "baz")) ;; => "foo bar baz"
(join #\Space '("foo" "bar" "baz")) ;; => "foo bar baz"

concat (&rest strings)

Join strings into one.

(concat "f" "o" "o") ;; => "foo"

Simple call of the built-in concatenate.

We actually also have uiop:strcat.

ensure (s &key wrapped-in prefix suffix) NEW in March, 2023

The "ensure-" functions return a string that has the specified prefix or suffix, appended if necessary.

This str:ensure function looks for the following key parameters, in order:

Example:

(str:ensure "abc" :wrapped-in "/")  ;; => "/abc/"
(str:ensure "/abc" :prefix "/")  ;; => "/abc"  => no change, still one "/"
(str:ensure "/abc" :suffix "/")  ;; => "/abc/" => added a "/" suffix.

These functions accept strings and characters:

(str:ensure "/abc" :prefix #\/)

warn: if both :wrapped-in and :prefix (and/or :suffix) are supplied together, :wrapped-in takes precedence and :prefix (and/or :suffix) is ignored.

ensure-prefix, ensure-suffix (start/end s) NEW in March, 2023

Ensure that s starts with start/end (or ends with start/end, respectively).

Return a new string with its prefix (or suffix) added, if necessary.

Example:

(str:ensure-prefix "/" "abc/") => "/abc/" (a prefix was added)
;; and
(str:ensure-prefix "/" "/abc/") => "/abc/" (does nothing)

ensure-wrapped-in (start/end s)

Ensure that s both starts and ends with start/end.

Return a new string with the necessary added bits, if required.

It simply calls str:ensure-suffix followed by str:ensure-prefix.

See also str:wrapped-in-p and uiop:string-enclosed-p prefix s suffix.

(str:ensure-wrapped-in "/" "abc") ;; => "/abc/"  (added both a prefix and a suffix)
(str:ensure-wrapped-in "/" "/abc/") ;; => "/abc/" (does nothing)

insert (string/char index s)

Insert the given string (or character) at the index index into s and return a new string.

If index is out of bounds, just return s.

(str:insert "l" 2 "helo") ; => "hello"

(str:insert "o" 99 "hell") : => "hell"

repeat (count s)

Make a string of s repeated count times.

(repeat 3 "foo") ;; => "foofoofoo"

add-prefix, add-suffix (items s)

Respectively prepend or append s to the front of each item.

pad (len s &key (pad-side :right) (pad-char #\Space)), pad-left, pad-right, pad-center (new in 0.16, 2019/12)

Fill s with characters until it is of the given length. By default, add spaces on the right:

(str:pad 10 "foo")
"foo       "
(str:pad 10 "foo" :pad-side :center :pad-char "+")
"+++foo++++"

If the given length is smaller than the length o s, return s.

Filling with spaces can easily be done with format:

(format nil "~va" len s) ;; => "foo       "
(format nil "~v@a" 10 "foo") ;; => "       foo" (with @)

To shorter strings

substring (start end s)

Return the substring of s from start to end.

It uses subseq with differences:

Examples:

  (is "abcd" (substring 0 t "abcd") "t denotes the end of the string")
  (is "abcd" (substring 0 nil "abcd") "nil too")
  (is "abcd" (substring 0 100 "abcd") "end can be too large")
  (is "abc" (substring 0 -1 "abcd") "end can be negative. Counts from the end.")
  (is "" (substring 0 -100 "abcd") "end can be negative and too low")
  (is "" (substring 100 1 "abcd") "start can be too big")
  (is "abcd" (substring -100 4 "abcd") "start can also be too low")
  (is "" (substring 2 1 "abcd") "start is bigger than end")

s-first (s)

Return the first letter of s.

Examples:

  (s-first "foobar") ;; => "f"
  (s-first "") ;; => ""

s-last (s)

Return the last letter of s.

s-rest (s)

Return the rest substring of s.

Examples:

  (s-rest "foobar") ;; => "oobar"
  (s-rest "") ;; => ""

s-nth (n s)

Return the nth letter of s.

Examples:

  (s-nth 3 "foobar") ;; => "b"
  (s-nth 3 "") ;; => ""

You could also use

(elt "test" 1)
;; => #\e
(string (elt "test" 1))
;; => "e"

shorten (len s &key ellipsis)

If s is longer than len, truncate it and add an ellipsis at the end (... by default). s is cut down to len minus the length of the ellipsis (3 by default).

Optionally, give an :ellipsis keyword argument. Also set it globally with *ellipsis*.

(shorten 8 "hello world")
;; => "hello..."
(shorten 3 "hello world")
;; => "..."
(shorten 8 "hello world" :ellipsis "-")
;; => "hello w-"
(let ((*ellipsis* "-"))
  (shorten 8 "hello world"))
;; => "hello w-"

To a fixed length

fit (len s)

Fit this string to the given length:

As such, it accepts the same key arguments as str:shorten and str:pad: ellipsis, pad-side, pad-char

CL-USER> (str:fit 10 "hello" :pad-char "+")
"hello+++++"

CL-USER> (str:fit 10 "hello world" :ellipsis "…")
"hello wor…"

If, like me, you want to print a list of data as a table, see:

CL-USER> (ql:quickload "cl-ansi-term")
CL-USER> (term:table '(("name" "age" "email")
              ("me" 7 "some@blah")
              ("me" 7 "some@with-some-longer.email"))
             :column-width '(10 4 20))
+---------+---+-------------------+
|name     |age|email              |
+---------+---+-------------------+
|me       |7  |some@blah          |
+---------+---+-------------------+
|me       |7  |some@with-some-l(…)|
+---------+---+-------------------+
CL-USER> (ql:quickload "cl-ascii-table")
CL-USER> (let ((table (ascii-table:make-table '("Id" "Name" "Amount") :header "Infos")))
  (ascii-table:add-row table '(1 "Bob" 150))
  (ascii-table:add-row table '(2 "Joe" 200))
  (ascii-table:add-separator table)
  (ascii-table:add-row table '("" "Total" 350))
  (ascii-table:display table))

.---------------------.
|        Infos        |
+----+-------+--------+
| Id | Name  | Amount |
+----+-------+--------+
|  1 | Bob   |    150 |
|  2 | Joe   |    200 |
+----+-------+--------+
|    | Total |    350 |
+----+-------+--------+
NIL

To and from lists

words (s)

Return list of words, which were delimited by whitespace.

unwords (strings)

Join the list of strings with a whitespace.

lines (s &key omit-nulls)

Split string by newline character and return list of lines.

A terminal newline character does not result in an extra empty string (new in v0.14, october 2019).

unlines (strings)

Join the list of strings with a newline character.

split (separator s &key omit-nulls limit start end regex)

Split into subtrings. If omit-nulls is non-nil, zero-length substrings are omitted.

By default, metacharacters are treated as normal characters. If regex is not nil, then separator is treated as regular expression.

(split "+" "foo++bar") ;; => ("foo" "" "bar")
(split #\+ "foo++bar") ;; => ("foo" "" "bar")
(split "+" "foo++bar" :omit-nulls t) ;; => ("foo" "bar")

(split "[,|;]" "foo,bar;baz") ;; => ("foo,bar;baz")
(split "[,|;]" "foo,bar;baz" :regex t) ;; => ("foo" "bar" "baz")

cl-ppcre has an inconsistency such that when the separator appears at the end, it doesn't return a trailing empty string. But we do since v0.14 (october, 2019).

rsplit (separator s &key limit regex)

Similar to split, but split from the end. In particular, this will be different from split when a :limit is provided, but in more obscure cases it can be different when there are multiple different ways to split the string.

(rsplit "/" "/var/log/mail.log" :limit 2) ;; => ("/var/log" "mail.log")
(cl-ppcre:split " " "a b c ")
("a" "b" "c")

(str:split " " "a b c ")
("a" "b" "c" "")

split-omit-nulls (separator s &key regex)

Because it is a common pattern and it can be clearer than an option coming after many parenthesis.

To and from files

from-file (filename)

Read the file and return its content as a string.

Example: (str:from-file "path/to/file.txt").

:external-format: if nil, the system default. Can be bound to :utf-8.

But you might just call uiop's uiop:read-file-string directly.

There is also uiop:read-file-lines.

to-file (filename s)

Write the string s to the file filename. If the file does not exist, create it, if it already exists, replace it.

Options:

Returns the string written to file.

Predicates

emptyp (s)

True if s is nil or the empty string:

  (emptyp nil) ;; => T
  (emptyp "")  ;; => T
  (emptyp " ") ;; => NIL

See also str:non-empty-string-p, which adds a stringp check.

blankp (s)

True if s is empty or only contains whitespaces.

(blankp "") ;; => T
(blankp " ") ;; => T
(emptyp " ") ;; => NIL

See also str:non-blank-string-p.

starts-with-p (start s &key ignore-case)

True if s starts with the substring start, nil otherwise. Ignore case by default.

(starts-with-p "foo" "foobar") ;; => T
(starts-with-p "FOO" "foobar") ;; => NIL
(starts-with-p "FOO" "foobar" :ignore-case t) ;; => T

Calls string= or string-equal depending on the case, with their :start and :end delimiters.

ends-with-p (end s &key ignore-case)

True if s ends with the substring end. Ignore case by default.

(ends-with-p "bar" "foobar") ;; => T

end can be a string or a character.

containsp (substring s &key (ignore-case nil))

Return true if s contains substring, nil otherwise. Ignore the case with :ignore-case t (don't ignore by default).

Based on a simple call to the built-in search (which returns the position of the substring).

s-member (list s &key (ignore-case *ignore-case*) (test #'string=))

Return T if s' is a member oflist'. Do not ignore case by default.

NOTE: s-member's arguments' order is the reverse of CL's member.

If :ignore-case or *ignore-case* are not nil, ignore case (using string-equal instead of string=).

Unlike CL's member, s-member returns T or NIL, instead of the tail of LIST whose first element satisfies the test.

prefixp and suffixp (items s)

Return s if all items start (or end) with it.

See also uiop:string-prefix-p prefix s, which returns t if prefix is a prefix of s,

and uiop:string-enclosed-p prefix s suffix, which returns t if s begins with prefix and ends with suffix.

wrapped-in-p (start/end s) NEW in March, 2023

Does s start and end with `start/end'?

If true, return s. Otherwise, return nil.

Example:

(str:wrapped-in-p "/" "/foo/"  ;; => "/foo/"
(str:wrapped-in-p "/" "/foo"  ;; => nil

See also: UIOP:STRING-ENCLOSED-P (prefix s suffix).

Case

Functions to change case: camel-case, snake-case,...

We use cl-change-case (go thank him and star the repo!). We adapt these functions to also accept symbols and characters (like the inbuilt casing functions). Also the functions return nil when argument is nil.

The available functions are:

:no-case (s &key replacement)
:camel-case (s &key merge-numbers)
:dot-case
:header-case
:param-case
:pascal-case
:path-case
:sentence-case
:snake-case
:swap-case
:title-case
:constant-case

More documentation and examples are there.

downcase, upcase, capitalize (s) fixing a built-in suprise.

The functions str:downcase, str:upcase and str:capitalize return a new string. They call the built-in string-downcase, string-upcase and string-capitalize respectively, but they fix something surprising. When the argument is nil, the built-ins return "nil" or "NIL" or "Nil", a string. Indeed, they work on anything:

(string-downcase nil) ;; => "nil" the string !
(str:downcase nil) ;; nil

(string-downcase :FOO) ;; => "foo"

downcasep, upcasep (s)

These functions return t if the given string contains at least one letter and all its letters are lowercase or uppercase, respectively.

(is (downcasep " a+,. ") t "downcasep with one letter and punctuation is true.")
(is (downcasep " +,. ") nil "downcasep with only punctuation or spaces is false")

alphap, lettersp (s)

alphap returns t if s contains at least one character and all characters are alpha (as in "^[a-zA-Z]+$").

lettersp works for unicode letters too.

(is (alphap "abcdeé") nil "alphap is nil with accents")
(is (lettersp "éß") t "lettersp is t with accents and ß")

alphanump, lettersnump (s)

alphanump returns t if s contains at least one character and all characters are alphanumeric (as in ^[a-zA-Z0-9]+$).

lettersnump also works on unicode letters (as in ^[\\p{L}a-zA-Z0-9]+$).

ascii-p (char/s)

Return t if the character / string is an ASCII character / is composed of ASCII characters.

An ASCII character has a char-code inferior to 128.

digitp (s)

Returns t if s contains at least one character and all characters are numerical (as for digit-char-p).

has-alpha-p, has-letters-p, has-alphanum-p (s)

Return t if s has at least one alpha, letter, alphanum character (as with alphanumericp).

Others

replace-first (old new s &key regex)

Replace the first occurence of old by new in s.

By default, metacharacters are treated as normal characters. If regex is not nil, then old is treated as regular expression.

(replace-first "a" "o" "faa") ;; => "foa"
(replace-first "fo+" "frob" "foofoo bar" :regex t) ;; => "frobfoo bar"

Uses cl-ppcre:regex-replace but quotes the user input to not treat it as a regex (if regex is nil).

replace-all (old new s &key regex)

Replace all occurences of old by new in s.

By default, metacharacters are treated as normal characters. If regex is not nil, old is treated as regular expression.

(replace-all "a" "o" "faa") ;; => "foo"
(replace-all "fo+" "frob" "foofoo bar" :regex t) ;; => "frobfrob bar"

Uses cl-ppcre:regex-replace-all but quotes the user input to not treat it as a regex (if regex is nil).

If the replacement is only one character, you can use substitute:

(substitute #\+ #\Space "foo bar baz")
;; "foo+bar+baz"

replace-using (replacement-list s &key regex)

Replace all associations given by pairs in a replacement-list and return a new string.

The replacement-list alternates a string to replace (case sensitive) and its replacement. By default, metacharacters in the string to replace are treated as normal characters. If regex is not nil, strings to replace are treated as regular expression.

Example:

(replace-using (list "%phone%" "987")
               "call %phone%")
;; => "call 987"

(replace-using (list "fo+" "frob"
                       "ba+" "Bob")
                 "foo bar"
                 :regex t)
;; => "frob Bobr"

remove-punctuation (s &key replacement)

Remove the punctuation characters from s, replace them with replacement (defaults to a space) and strip continuous whitespace.

(str:remove-punctuation "I say: - 'Hello, world?'") ;; => "I say Hello world"

Use str:no-case to remove punctuation and return the string as lower-case.

prefix (list-of-strings) (renamed in 0.9)

(renamed from common-prefix in v0.9)

Find the common prefix between strings.

Example: (str:prefix '(\"foobar\" \"foozz\")) => \"foo\"

Uses the built-in mismatch, that returns the position at which the strings fail to match.

Return a string or nil when the input is the void list.

suffix (list-of-strings)

Find the common suffix between strings.

count-substring (substring s &key start end)

Counts the non-overlapping occurrences of substring in s. You could also count only the ocurrencies between start and end.

Examples:

(count-substring "abc" "abcxabcxabc")
;; => 3
(count-substring "abc" "abcxabcxabc" :start 3 :end 7)
;; => 1

s-assoc-value (alist key)

Returns the value of a cons cell in alist with key key, when key is a string. The second return value is the cons cell, if any was matched.

The arguments are in the opposite order of cl:assoc's, but are consistent with alexandria:assoc-value (and str).

(s-assoc-value '(("hello" . 1)) "hello")
;; 1
;; ("hello" . 1)

(alexandria:assoc-value '(("hello" . 1)) "hello")
;; NIL
(alexandria:assoc-value '(("hello" . 1)) "hello" :test #'string=)
;; 1
;; ("hello" . 1)

(assoc "hello" '(("hello" . 1)))
;; NIL
(assoc "hello" '(("hello" . 1)) :test #'string=)
;; ("hello" . 1)
(cdr *)
;; 1

Macros

string-case

A case-like macro that works with strings (CL case's test function is eql, and that isn't enough for strings).

Example:

(str:string-case "hello"
  ("foo" 1)
  (("hello" "test") 5)
  (nil (print "input is nil"))
  (otherwise (print "non of the previous forms was caught.")))

You might also like pattern matching. The example below with trivia is very similar:

(trivia:match "hey"
  ("hey" (print "it matched"))
  (otherwise :nothing))

Note that there is also http://quickdocs.org/string-case/.

match (experimental) · new in Feb, 2024

A COND-like macro to match substrings and bind variables to matches. Regular expressions are allowed for matches.

_ is a placeholder that is ignored.

THIS MACRO IS EXPERIMENTAL and might break in future releases.

Example:

(str:match "a 1 b 2 d"
  (("a " x " b " y " d") ;; => matched
   (+ (parse-integer x) (parse-integer y)))
  (t
   'default-but-not-for-this-case)) ;; default branch
;; => 3

(str:match "a 1 b c d"
  (("a 2 b" _ "d") ;; => not matched
   (print "pass"))
  (("a " _ " b c d") ;; => matched
   "here we go")
  (t 'default-but-not-for-this-case)) ;; default branch
;; => "here we go"

Match with regexs:

(str:match "123 hello 456"
 (("\\d+" s "\\d+")
   s)
 (t "nothing"))
;; => " hello "

Changelog

Before:

(str:split " " "a b c ")
("a" "b" "c")  ;; like cl-ppcre:split

Now:

(str:split " " "a b c ")
("a" "b" "c" "")

Dev and test

Regression testing is implemented with fiveam.

Main test suite

Either use

  (asdf:test-system :str)

or load the test package str.test and then

  (fiveam:run! 'test-str:str)

Specific test suite

  (fiveam:run! 'test-str:replace-functions)

Test suite names:

Specific test

  (fiveam:run! 'test-str::downcase) ;; (test symbols are unexported)

Test when defined

First you need to

(setf fiveam:*run-test-when-defined* t)

then the test is run after each definition / compilation. This can be done with C-c C-c on emacs.

See also

Inspired by the famous Emacs Lisp's s.el.