vindarel / cl-str

Modern, simple and consistent Common Lisp string manipulation library.
https://vindarel.github.io/cl-str/
MIT License
305 stars 37 forks source link

Ensure functions #76

Closed vindarel closed 1 year ago

vindarel commented 2 years ago

Example:

  (is (ensure-starts-with "/" "/abc") "/abc")

Feedback welcome (especially for naming).

vindarel commented 2 years ago

@tdrhq @charJe Hello, feedback welcome for the naming of these new functions :) ensure-starts-with, ensure-enclosed-by

charJe commented 2 years ago

I like it. If you want to make the names shorter, maybe ensure-start?

alphapapa commented 2 years ago

The words "prefix" and "suffix" are more concise and descriptive than "starts-with" and "ends-with". "Enclosed-by" seems awkward; "wrapped-in" feels more natural to me, and is probably more accessible to non-native English speakers.

vindarel commented 2 years ago

Thanks. Indeed, I hesitated with ensure-prefix. But I judged that the prefix & suffix related functions are a bit special on their own (for one, they don't accept the usual parameters with the full string at the end), and I thought that auto-completing str:start… would give more accurate / predictive results.

I see your update: wrapped-in sounds good, thanks. (tempted by ensure-wrapped-in !)

charJe commented 2 years ago

Why would the ensure- functions not have the full string at the end?

vindarel commented 2 years ago

Why would the ensure- functions not have the full string at the end?

sorry, I badly expressed myself: the existing functions with "prefix" have a different signature:

(defun prefix (items) 

(defun common-prefix (items)

(defun prefix? (items prefix)  ;; <-- not S, as a full word 

so using "start-with…" would feel more consistent.

charJe commented 2 years ago

(ensure-starts-with "/" "/abc") isnt "/" considered the full string. It is not the last argument. This is confusing to me because starts-with-p has the full string as the last argument.

Disregard, I was confused, all is well.

tdrhq commented 2 years ago

Could you clarify a typical use case for these functions? It could help with the naming.

In particular just calling it prefix and suffix makes me ask, what happens in this case: (ensure-starts-with "bar" "/ba")

Just from the name, should that give us "/bar" or "/babar"? The code clearly suggests it's the latter, but without knowing the use case the function name (and even documentation) feels a little ambiguous.

Many of your test-cases seem to suggest single character prefixes, maybe that's where the use case lies?

Another naming direction you could with is something like concat-if-required (not that exactly, but something of that manner)

mdbergmann commented 2 years ago

What's the difference with starts-with-p. I'd prefer the common CL naming scheme for predicates.

charJe commented 2 years ago

The ensure- functions return a string that has the specified prefix or suffix, appened if necessary.

vindarel commented 2 years ago

My typical use-case if for slashes with URIs:

(str:ensure-starts-with "/" "foo/bar") ;; => /foo/bar/ (edited for middle / )
(str:ensure-enclosed-by "/" "9278-uuid-123") ;; => "/9278-uuid-123/"

in that case, that's a poor man URI handling when I don't want (or need) to reach for a proper lib like Quri.

@tdrhq

 (str:ensure-starts-with "bar" "/ba")
"bar/ba"
tdrhq commented 2 years ago

@vindarel Oh sorry I meant (str:ensure-starts-with "/ba" "bar"). The shortest way to "ensure" that "bar" starts with "/ba" is to just prepend a "/"

tdrhq commented 2 years ago

I agree with your use case btw, I can see myself using that frequently. Just not sure about the semantics or usefulness of multi-character prefixes/suffixes.

tdrhq commented 2 years ago

I was trying to figure out real world situations where multi-character prefixs and suffixes come up, this is the best I got:

(str:ensure-ends-with "/index.html" "https://example.com/") --> I would expect this to return "https://example.com/index.html" not "https://example.com//index.html".

(str:ensure-starts-with "https://" "www.example.com") -> both behaviours would probably be reasonable

vindarel commented 2 years ago

yeah, good catch for the multi-character prefixes/suffixes. I didn't have such a use-case, so gotta give them more thought.

For this use-case though: (str:ensure-ends-with "/index.html" "https://example.com/") I am thinking about another function, to join several elements as in a URI, to add a "/" if necessary and to avoid double slashes: (str:join-as-uri "http://example.com/" "/index.html") -> the right thing, no double "/" (and for more use cases, we refer to Quri).

You might be on to something with this use case:

(str:ensure-starts-with "https://" "www.example.com")  ;; = logical
(str:ensure-starts-with "https://" "https://www.example.com")  ;; = nothing to do here
;; but here:
(str:ensure-starts-with "https:// "http://www.example.com") ;; => should it give "https://www.example.com" ?

if it should be the default behavior, another function or a keyword argument, I'll see. But I didn't meet the use case, our examples are URI-specific, so I'll probably stay simple with the current behavior.

tdrhq commented 2 years ago

(str:ensure-starts-with "https:// "http://www.example.com") ;; => should it give "https://www.example.com" ?

This one should definitely not be part of cl-str. My URI examples were mostly for "first-pass", "hacky" implementations. It's convenient to just to string handling when you're just prototyping something.

PS. I went through my code base trying to find where this pattern shows up, the trailing slash shows up in several locations. The exact pattern of checking if it's a suffix and only then appending happens only at one place: https://github.com/screenshotbot/screenshotbot-oss/blob/main/src/util/store.lisp#L46 (copied from my internal repo), but there are several other places where I just do (format nil "~a/" var), and ensure-ends-with would read a lot better. They're all related to the "/" as far as I can tell.

vindarel commented 1 year ago

I renamed the functions to ensure-start, ensure-end, ensure-wrapped-in, I added wrapped-in-p.

I noticed a difference in a hedge case:

(str:starts-with-p nil "rst")
NIL
CL-USER> (str:wrapped-in-p nil "rst")
"rst"

I didn't add something for (str:join-as-uri "http://example.com/" "/index.html") in this PR, which I'd have the need for. But maybe I should just use Quri. BTW, off-topic for this PR, but this library seems to do what I want: be able to join URIs, but keep manipulating strings, because hey sometimes it's more convenient / quicker:

CL-USER > (url-parse "google.com" :path "/index.html")
http://google.com/index.html
;; and then:
CL-USER > (url-parse * :scheme "https")
https://google.com/?s=common+lisp

https://github.com/massung/url (not on QL)

With Quri… ok, not so bad.

CL-USER> (quri:merge-uris (quri:uri "/index.html") (quri:uri "https://en.wikipedia.org/wiki/URL"))
#<QURI.URI.HTTP:URI-HTTPS https://en.wikipedia.org/index.html>
CL-USER> (format nil "~a" *)
"https://en.wikipedia.org/index.html"

There's also a commit that says "Let merge-uris accept string parameters." "As it did in version 0.4.0." and this was many months ago, looks like I'm lagging behind)

kilianmh commented 1 year ago

The idea of ensure function is nice.

One more option could be having one function with start, end, wrapped-in as keyword parameter, where either wrapped-in or start and/or end can be supplied. E.g.

(defun ensure (s &key start end wrapped-in)

Or with different parameter-names

(defun ensure (s &key prefix suffix wrapped-in)

vindarel commented 1 year ago

Good idea. Something like

(defun ensure (s &key start end wrapped-in)
  (cond
    (wrapped-in
     (ensure-wrapped-in wrapped-in s))
    ((and start end)
     (ensure-start start (ensure-end end s)))
    (start
     (ensure-start start s))
    (end
     (ensure-end end s)))
    (t
     s)))

in that case, a ":start" key is weird and too different than its usual meaning, so probably ":prefix" is better. This leads again to my hesitation to use ensure-prefix instead of ensure-start (because we have starts-with-p and the existing "*prefx" functions are a bit unusual in their signature. But "ensure-start" may not be has explicit has "ensure-starts-with" which is too long.). Shall we vote? :D

vindarel commented 1 year ago

re. naming, I now tend to ensure-prefix/suffix, along with (ensure s :prefix … :suffix …) @kilianmh thoughts?

kilianmh commented 1 year ago

Yes, prefix/suffix makes more sense, even if you look up dictionary: prefix / suffix.

Here I have an exemplary implementation of partial prefix adding

  (defun ensure-prefix (prefix s)
    (labels ((add-prefix (remaining-prefix)
               (cond ((emptyp remaining-prefix)
                      (concat prefix s))
                     ((starts-with-p remaining-prefix s)
                      (concat (subseq prefix 0 (1+ (length remaining-prefix)))
                              s))
                     (t
                      (add-prefix (s-rest remaining-prefix))))))
      (if (or (null prefix)
              (starts-with-p prefix s))
          s
          (add-prefix (s-rest prefix)))))

(ensure-prefix "abc" "c3235") ;=> "abc3235"

We could implement ensure-suffix either similiarly or reuse ensure-prefix (less efficient, but less code)

  (defun ensure-suffix (suffix s)
    (reverse
     (ensure-prefix (reverse suffix)
                    (reverse s))))
vindarel commented 1 year ago

Alright, thanks, let's name them properly and merge quick, I have code using these functions now!

(ensure-prefix "abc" "c3235") ;=> "abc3235"

but this is too weird, I need a strong rationale to add this, but definitely not in the "ensure" functions nor in this PR. I questioned the possibility, but it was discarded by the discussion. For URLs, there are URL handling libraries.