vindarel / cl-str

Modern, simple and consistent Common Lisp string manipulation library.
https://vindarel.github.io/cl-str/
MIT License
305 stars 37 forks source link

Idea: add string match for easily handling sub-string. [merged, looking for feedback] #113

Open ccqpein opened 8 months ago

ccqpein commented 8 months ago

I was thinking if the cl-str can "pattern match" the string like some other languages' match case.

So I write my own version (like example below), what do you guys think? Is that fit the cl-str's purpose? I checked the doc and there is a string-case. Should I change the name of the macro? Thanks!

(defun expand-match-branch (str block patterns forms)
  (case patterns
    ((t 'otherwise) `(progn ,@forms))
    (t (loop with regex = '("^")
            and vars = '()
            for x in patterns
            do (cond ((stringp x)
                      (push x regex))
                     ((symbolp x)
                      (push "(.*)" regex)
                      (push x vars))
                     (t (error "only symbol and string allowed in patterns")))
            finally (push "$" regex)
            finally (return (let ((whole-str (gensym))
                                  (regs (gensym)))
                              `(multiple-value-bind (,whole-str ,regs)
                                   (cl-ppcre:scan-to-strings
                                    ,(apply #'str:concat (reverse regex))
                                    ,str)
                                 (declare (ignore ,whole-str))
                                 (when ,regs
                                   (let ,(reverse vars)
                                     ,@(loop for ind from 0 below (length vars)
                                             collect `(setf ,(nth ind (reverse vars))
                                                            (elt ,regs ,ind)))
                                     (return-from ,block
                                       (progn ,@forms)))))))))))

(defmacro str-match (str &rest match-branches)
  (let ((block-sym (gensym)))
    `(block ,block-sym
       ,@(loop for statement in match-branches
               collect (expand-match-branch
                        str
                        block-sym
                        (nth 0 statement)
                        (cdr statement))))))
CL-USER> (macroexpand-1 '(str-match sss
                     (("a" b "c") (parse-integer b))
                     (("a" x "c" y "b") (print (parse-integer x)) (print (parse-integer y)) (list (parse-integer x) (parse-integer y)))
                     (t (print "aa"))
                     ))
(BLOCK #:G415
  (MULTIPLE-VALUE-BIND (#:G416 #:G417)
      (CL-PPCRE:SCAN-TO-STRINGS "^a(.*)c$" SSS)
    (DECLARE (IGNORE #:G416))
    (WHEN #:G417
      (LET (B)
        (SETF B (ELT #:G417 0))
        (RETURN-FROM #:G415 (PROGN (PARSE-INTEGER B))))))
  (MULTIPLE-VALUE-BIND (#:G418 #:G419)
      (CL-PPCRE:SCAN-TO-STRINGS "^a(.*)c(.*)b$" SSS)
    (DECLARE (IGNORE #:G418))
    (WHEN #:G419
      (LET (X Y)
        (SETF X (ELT #:G419 0))
        (SETF Y (ELT #:G419 1))
        (RETURN-FROM #:G415
          (PROGN
           (PRINT (PARSE-INTEGER X))
           (PRINT (PARSE-INTEGER Y))
           (LIST (PARSE-INTEGER X) (PARSE-INTEGER Y)))))))
  (PROGN (PRINT "aa")))
T
CL-USER> (str-match "a1c5b"
(("a" b "c") (parse-integer b))
(("a" x "c" y "b") (print (parse-integer x)) (print (parse-integer y)) (list (parse-integer x) (parse-integer y)))
(t (print "aa"))
)

1 
5 
(1 5)
vindarel commented 8 months ago

Nice, that is pretty interesting.

With some indentation the snippet becomes


(str-match "a1c5b"
           (("a" b "c")
            (parse-integer b))
           (("a" x "c" y "b")
            (print (parse-integer x))
            (print (parse-integer y))
            (list (parse-integer x) (parse-integer y)))
           (t (print "aa")))

so by using &body instead of &rest we get this indentation:


(str-match "a1c5b"
  (("a" b "c")
   (parse-integer b))
  (("a" x "c" y "b")
   (print (parse-integer x))
   (print (parse-integer y))
   (list (parse-integer x) (parse-integer y)))
  (t (print "aa")))

Would you not use the Trivia library for pattern matching? It probably does this, and more.

What are users going to ask for pattern matching features after we introduce this one?

like some other languages' match case.

what are your favourite examples?

(and yes "string-match" might be better)

ccqpein commented 8 months ago

so by using &body instead of &rest we get this indentation:

Nice catch!

Would you not use the Trivia library for pattern matching? It probably does this, and more.

Gonna check it now.

ccqpein commented 8 months ago

I checked the trivia it looks good when I am trying to pattern matching the list like

(trivia:match '(1 2 3)
  ((list* 1 x _)
   x)
  ((list* _ x)
   x)) ;; => 2

but I have an issue when I run the string pattern. I am not sure because I am using sbcl or not (maybe because this?)

beside, I can match the whole string like

(trivia:match "a1c5b" ("a1c5b" 1))
;; or
(trivia:match "ab" ((vector #\a #\b) 1))

but not these:

(trivia:match "a1c5b" ((string "a1c" "5b") 1))

so look like I can only binding char rather than the sub-string like my purposal

vindarel commented 7 months ago

Let's use and try this macro. I'm interested in everybody's feedback.

A stupid test: I match like your example, but I don't use the matching variable, so I get style warnings:

(match "a1c5b"
       (("a" i "c")
        (print "got axc"))
       (("a" x "c" y "b")
        (print "got axcyb"))
       (t (print "default"))
       )
;; =>
;; ;   The variable I is assigned but never read.
;; (and for x and y)

Would it be possible to avoid the warnings? Using a _ placeholder?

ccqpein commented 7 months ago

Yes, I just try on my side. Will give PR soon.

ccqpein commented 7 months ago

Gave the PR #114

vindarel commented 7 months ago

I tried this more on an AOC problem (day 19), and OMG this match macro felt so powerful. Easier and faster than searching for the right regexp.

vindarel commented 7 months ago

Other quick test:

(str::match "123 hello 456"
             (("\\d+" s "\\d+")
              s)
             (t "nothing"))
;; =>" hello 45"

I didn't expect to see "45". The first number regex was correctly matched, not the second?

(str::match "123 hello 456"
             (("\\d+" s "\\d*")
              s)
             (t "nothing"))
;; " hello 456"

here I didn't expect "456".

ccqpein commented 7 months ago

@vindarel Just figure out fixing this issue need to write the un-greedy regex. I just fix it in the latest commit. Good catch!

ccqpein commented 7 months ago

The PR #114 is merged, I am not sure if we keep this idea issue open or not for future potential changes. I left this decision to repo owner.