vindarel / cl-str

Modern, simple and consistent Common Lisp string manipulation library.
https://vindarel.github.io/cl-str/
MIT License
305 stars 37 forks source link

Have replace-* work with characters. Add a compiler macro for this. #97

Open K1D77A opened 1 year ago

K1D77A commented 1 year ago

Add 3 aliases for the replace- called substitute- because cl-user:replace is destructive.

K1D77A commented 1 year ago

I was also going to ask why such heavy use of ppcre is used when many functions could be replaced with plain CL?

kilianmh commented 1 year ago

Hi @K1D77A, thank you for your pull request!

I appreciate the suggestions of setting alias functions and adding support for characters in the replace-* functions.

Here are some comments:

  1. Rename all replacement functions to substitute-*, while the replace-* functions should be alias (considered deprecated). Additionally, nsubstitute-* functions should be defined with define-modify-macro. This approach should align better with the cl standard? @vindarel

  2. Currently, the compiler macros cause sbcl 2.3.5 compilation to crash due to infinite inlining. In my opinion, compiler macros will not work here because we need to either check the type of old & new at runtime, or alternatively, we can simply coerce old & new to strings using (string ...).

  3. In my opinion, you are right that we should use standard cl-functions if possible. However, it might sometimes make sense to use cl-ppcre, if we give the regex option to the users #63

K1D77A commented 1 year ago
(define-compiler-macro substitute-first (&whole form &rest args)
  (declare (ignore form))
  (destructuring-bind (old new string)
      args 
    (if (and (constantp old)
             (characterp old)
             (constantp new)
             (characterp new))
        `(%substitute-first-character ,old ,new ,string)
        `(%substitue-first-string ,old ,new ,string))))

(defun %substitute-first-character (old-char new-char string)
  (substitute new-char old-char string :test #'char= :count 1))

(defun substitute-first (old new s)
  "Substitute the first occurrence of `old` by `new` in `s`. Arguments are not regexs."
  (if (and (characterp old)
           (characterp new))
      (%substitute-first-character old new s)
      (%substitute-first-string (string old) (string new) s)))

(defun %substitute-first-string (old new s)
  (let* ((ppcre:*allow-quoting* t)
         (old (concatenate 'string "\\Q" old))) ;; treat metacharacters as normal.
    ;; We need the (list new): see !52
    (ppcre:regex-replace old s (list new))))

Does this still cause infinite inlining for you?

vindarel commented 1 year ago

Add 3 aliases for the replace- called substitute- because cl-user:replace is destructive.

good catch, but is it really an issue though? The replace verb is less CL-standard-compliant but more… standard in general and discoverable. Isn't it enough to trust cl-str? (accordingly, it is documented nowhere that replace-all returns a new string).

The goal of cl-str was not to make a better standard.

It is much simpler to coerce old and new to a string, did you have an extra motivation for the compiler macro? (super-optimizing?)

Thanks for sending a PR (and for help in the review).

kilianmh commented 1 year ago

From cl-ppcre documentation (Hints, comments, performance considerations):

CL-PPCRE uses compiler macros to pre-compile scanners at load time if possible. This happens if the compiler can determine that the regular expression (no matter if it's a string or an S-expression) is constant at compile time and is intended to save the time for creating scanners at execution time (probably creating the same scanner over and over in a loop). Make sure you don't prevent the compiler from helping you. For example, a definition like this one is usually not a good idea:

(defun regex-match (regex target)
  ;; don't do that!
  (scan regex target))

Accordingly, the old parameter should be stringified at run-time only if it is not a constant (= neither string nor character). Passing the compile-time known old string, or (stringified!) character, in a compiler macro to cl-ppcre:regex-replace should allow the creation of scanners at load time and therefore improve run time performance for these specific cases.