ruricolist / serapeum

Utilities beyond Alexandria
MIT License
420 stars 41 forks source link

SAME #88

Closed lukego closed 3 years ago

lukego commented 3 years ago

I'm wondering if this list function SAME is worth submitting a PR for?

Usage:

(same #'length '((1 2 3) (a b c) (foo bar baz)))
=> T`

which seems more intention-revealing than other formulations like:

(= 1 (count (distinct) list :key key))  ; more wordy
(apply #'eql (mapcar #'key list))       ; error when less than two elements

Maybe worth adopting?

(defun same (key-fn list &key (test #'eql))
  "Return true if KEY-FN returns a matching value for each element of LIST.

TEST is used to check for a match.

Example:
    (same #'length lists)
  ≡ (= 1 (count-if (distinct) lists :key #'length))"
  (or (null list)
      (loop with v = (funcall key-fn (first list))
            for x in (rest list)
            always (funcall test v (funcall key-fn x)))))
phoe commented 3 years ago

I think it's a meaningfully valid sequence function, therefore it should accept a sequence rather than just a list.

lukego commented 3 years ago

I think it's a meaningfully valid sequence function, therefore it should accept a sequence rather than just a list.

Can anyone think of a nice formulation of a general sequence version? (Drop the LOOP I suppose?)

phoe commented 3 years ago
(defun same (sequence &key (key #'identity) (test #'eql))
  (or (emptyp sequence)
      (let ((value (funcall key (elt sequence 0))))
        (every (lambda (x) (funcall test value (funcall key x)))
               sequence))))

It might look a bit more elegant than @dkochmanski's version but it is also more wasteful, because it makes the redundant comparison of the first element with the first element. @dkochmanski's version will be more efficient because of that.

Still, I'd propose to move key into the keyword argument zone, in case someone wants to call (same '(t t t t t)) which looks meaningful to me with the standard defaults of :key #'identity :test #'eql.

lukego commented 3 years ago

How should empty list be handled? I made SAME return T for that case but this makes the comment wrong i.e. you'd need (<= 1 (count-if ...)) to match that case.

phoe commented 3 years ago

That's the nasty corner case. One data point is that alexandria:extremum, which is somewhat similar in nature, returns nil when called with an empty sequence.

(Offtopic: it also has start and end keyword arguments; maybe they should apply here, too?...)

ruricolist commented 3 years ago

I think it should probably return t for consistency with every.

There is an internal macro, do-subseq, that makes it easy to handle start and end arguments for any sequence.

svspire commented 3 years ago

I like phoe's answer. Here's a similar one using #'reduce that doesn't evaluate the first element twice.

The semantics here are very different from typical uses of #'reduce, but I find myself using the (reduce #'funcall ...) construct more and more lately to solve problems where the first value requires special treatment.

(defun same (fn seq &key (test #'eql))
  (let ((samevalue nil))
    (labels ((first-fn (element)
               (setf samevalue (funcall fn element))
               #'next-fn)
             (next-fn (element)
               (cond ((funcall test samevalue (funcall fn element))
                      #'next-fn)
                     (t (return-from same nil)))))
      (declare (dynamic-extent samevalue
                               (function first-fn)
                               (function next-fn)))
      (eql #'next-fn ; this line is not strictly necessary but it ensures that 
                     ;  the returned value is oneof {T,nil} and it provides extra
                     ;  insurance that #'next-fn -- which is declared dynamic --
                     ;  is not accidentally retained by the caller.
           (reduce #'funcall seq :initial-value #'first-fn)))))

#|
(same #'numberp '(a 3 4 5)) ;--> nil

(same #'numberp '(3 4 5 6)) ;--> t

(same #'numberp '(3 4 5 a)) ;--> nil

(same #'numberp nil) ;--> nil. Same of any null sequence is nil. Easily changeable of course.

(same #'numberp t) ;--> error, of course

(same #'null nil) ;--> nil

(same #'symbolp nil) ;--> nil

(same #'numberp '(1)) ;--> t

(same #'symbolp '(a)) ;--> t

(same #'numberp '(a)) ;--> t. Exercise: This is correct, but why?

(same #'oddp '(2 4 6 8 10)) ;--> t. Same exercise.

(same #'oddp '(2 4 6 8 9 10)) ;--> nil

(same #'null '(nil nil nil nil nil)) ;--> t

(same #'symbolp '(a b c d e f)) ;--> t

(same #'symbolp `(a b c d e f ,pi)) ;--> nil

(same #'numberp #(a 3 4 5)) ;--> nil. Works on vectors too.

(same #'numberp #(3 4 5 6)) ;--> t
|#
lukego commented 3 years ago

Here's another implementation using Serapeum's handy internal do-subseq macro that @ruricolist mentioned above:

(defun same (key-fn seq &key (test #'eql) (start 0) end)
  "Return true if KEY-FN returns the same value for any/all members of LIST."
  (let (init val)
    (do-subseq (item seq t :start start :end end)
      (if (null init)
          (setf val (funcall key-fn item) init t)
          (unless (funcall test val (funcall key-fn item))
            (return-from same nil))))))
fiddlerwoaroof commented 3 years ago

Here's another implementation that (for me, at least) is a bit nicer because it's less imperative:

(defun same (key-fn seq &key (test #'eql) (start 0) end)
  (= 1
     (length
      (serapeum:nub (subseq seq start end)
                    :test test
                    :key key-fn))))

Alternatively, this version avoids the call to length and (to my mind) reads a bit better:

(defun same (key-fn seq &key (test #'eql) (start 0) end)
  (null (cdr (serapeum:nub (map 'list key-fn
                                (subseq seq start end))
                           :test test))))
lukego commented 3 years ago

@fiddlerwoaroof How do these work though? My first impression is that nub will call remove-duplicates that will (on SBCL) make a full copy of the input list (which was already copied by subseq) and then do an O(n^2) destructive update of that copied list. Is that right?

That would probably be fine for the vast majority of uses but for me it violates the principle of least surprise a bit because this naturally feels like a function that would take O(n) time and O(1) space i.e. iterating through the list once and only picking up state from the first element. So I'd feel compelled to document this being an O(n^2) time and O(n) space function and then that complicated docstring would lose some of the benefits of the concise implementation.

ruricolist commented 3 years ago

To answer the larger question, this is definitely worth a PR. For me a way to simplify comparing the value of the same slot/accessor in multiple instances would be immediately useful.

I have no strong feelings on whether the key function should be a required argument, if there are arguments either way.

(same #'length '((1 2 3) (a b c) (foo bar baz)))
(same '((1 2 3) (a b c) (foo bar baz)) :key #'length)

;; No duplicates?
(same (distinct) xs)
(same xs :key (distinct))

;; Compare multiple slots at once.
(same (juxt #'foo-x #'foo-y #'foo-z) (list a b c) :test #'equal)
(same (list a b c) :key (juxt #'foo-x #'foo-y #'foo-z) :test #'equal)
lukego commented 3 years ago

I'll make a PR now. I already added the function to my local copy of Serapeum.

I'm proposing that KEY be a positional argument. I think that (same #'length objects) reads well, like (same-length objects), and that (same objects :key #'length) is a little more work to brain-parse.

Also, if you want to compare sameness under identity there are already good options e.g. (apply #'eql* objects) or (<= 1 (count-if (distinct) objects)).

lukego commented 3 years ago

@ruricolist Sweet examples with juxt! That's one of those functions I've read about a few times but haven't started using yet.

fiddlerwoaroof commented 3 years ago

@lukego I actually didn't look into the implementation of nub, and it turns out that it's not the function I wanted. What I intended was a function that just deduplicates consecutive duplicates (e.g. this one: https://github.com/fiddlerwoaroof/data-lens/blob/master/lens.lisp#L86-L91 ). This preserves O(n) runtime.