ruricolist / serapeum

Utilities beyond Alexandria
MIT License
428 stars 42 forks source link

Maybe add TAILP* #118

Open phoe opened 2 years ago

phoe commented 2 years ago

I can't find a utility that can tell me if any two lists share structure anywhere.

TAILP only works correctly if you provide it the concrete tail of the list. Running TAILP for every element of the other list mostly achieves the goal but has quadratic time complexity, which is painful, plus one needs to remember to handle dotted lists.

I propose adding a TAILP* operator which uses a hash table for detection of shared structure and therefore can do this in linear time and linear memory. Oh, and it also handles circularities.

A quick and hasty implementation:

(defun tailp* (object-1 object-2)
  (let ((hash-table (make-hash-table :test #'eq)))
    ;; Populate the hash table with objects from the first list.
    (typecase object-1
      (atom (setf (gethash object-1 hash-table) :first))
      (cons (loop for cons on object-1
                  when (gethash cons hash-table)
                    ;; For cyclic lists, return gracefully.
                    return nil
                  else
                    ;; Mark cons cell as visited.
                    do (setf (gethash cons hash-table) :first)
                  when (atom (cdr cons))
                    ;; For proper and dotted lists, remember the last CDR.
                    do (setf (gethash (cdr cons) hash-table) :first))))
    ;; Check if any list structure of the second list is in the hashtable.
    (typecase object-2
      (atom (when (eq (gethash object-2 hash-table) :first) object-2))
      (cons (loop for cons on object-2
                  for result = (gethash cons hash-table)
                  when (eq result :first)
                    ;; We found the tail, return it.
                    return cons
                  when (eq result :second)
                    ;; If we hit a cycle in this list, there is no match.
                    return nil
                  else
                    ;; Mark cell as visited.
                    do (setf (gethash cons hash-table) :second)
                  when (atom (cdr cons))
                    ;; For proper and dotted lists, check the last CDR.
                    when (gethash (cdr cons) hash-table)
                      return (cdr cons))))))

Some quick and hasty REPL tests:

SERAPEUM> (tailp* (list 1 2 3) (list 1 2 3))
NIL

SERAPEUM> (tailp* (list* 1 2 3) (list* 1 2 3))
3

SERAPEUM> (let ((shared (list 3 4 5))) 
            (tailp* (list* 1 2 shared) (list* 1 2 shared)))
(3 4 5)

SERAPEUM> (tailp* (list* 1 2 3) (list* 3))
3

SERAPEUM> (tailp* (list* 3) (list* 1 2 3))
3

SERAPEUM> (setf *print-circle* t)
T

SERAPEUM> (let ((cycle (make-circular-list 3 :initial-element 0)))
            (tailp* (list* 1 2 cycle) (list* 1 2 cycle)))
#1=(0 0 0 . #1#)

SERAPEUM> (let ((cycle-1 (make-circular-list 3 :initial-element 0))
                (cycle-2 (make-circular-list 3 :initial-element 0)))
            (tailp* (list* 1 2 cycle-1) (list* 1 2 cycle-2)))
NIL

How does this sound? I can clean this up, add more tests, and make a PR if this sounds like a good addition.

ruricolist commented 2 years ago

In the flurry of activity I overlooked this issue. If you're still interested this seems like a good idea, although it could use a more descriptive name (common-tail-p?).

phoe commented 2 years ago

Sure, that works. I suggested tailp* because of the sorta-obvious similarity with the standard tailp, but naming is the hard part.

I'll submit this as a PR eventually.