janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.45k stars 223 forks source link

Proposal: Allow mapcat et al to accept multiple iterable arguments #1159

Closed primo-ppcg closed 1 year ago

primo-ppcg commented 1 year ago

Of the functions that return a result after applying a mapping, which includes map, mapcat, keep, count, some and all, the only one which allows multiple iterable arguments is map. This can result in some awkward forms, such as

(count identity (map = a b))

which would be easier read and understood if written as

(count = a b)

not to mention more efficient, if it could avoid a double iteration.

The current implementation of map is non-trivial. It includes special-casing and unrolling for the sake of efficiency. In order to apply the same iteration logic to the other functions, I believe this would first need to be generalized.

The special-cased unrolling is regular enough that it can be generated with a macro:

(defmacro- map-n
  `Generates efficient map logic for a specific number of
  iterable beyond the first.`
  [n res f ind inds]
  ~(do
    (def ,(seq [k :range [0 n]] (symbol 'ind k)) ,inds)
    ,;(seq [k :range [0 n]] ~(var ,(symbol 'key k) nil))
    (each x ,ind
      ,;(seq [k :range [0 n]]
        ~(if (= nil (set ,(symbol 'key k) (next ,(symbol 'ind k) ,(symbol 'key k)))) (break)))
      (array/push ,res (,f x ,;(seq [k :range [0 n]] ~(in ,(symbol 'ind k) ,(symbol 'key k))))))))

(defn map
  `Map a function over every value in a data structure and
  return an array of the results.`
  [f ind & inds]
  (def res @[])
  (def ninds (length inds))
  (case ninds
    0 (each x ind (array/push res (f x)))
    1 (map-n 1 res f ind inds)
    2 (map-n 2 res f ind inds)
    3 (map-n 3 res f ind inds)
    4 (map-n 4 res f ind inds)
    (do
      (def iter-keys (array/new-filled ninds))
      (def call-buffer (array/new-filled ninds))
      (var done false)
      (each x ind
        (forv i 0 ninds
          (let [old-key (in iter-keys i)
                ii (in inds i)
                new-key (next ii old-key)]
            (if (= nil new-key)
              (do (set done true) (break))
              (do (set (iter-keys i) new-key) (set (call-buffer i) (in ii new-key))))))
        (if done (break))
        (array/push res (f x ;call-buffer)))))
  res)

The expansion of this is essentially identical to the current implementation of map, with a few minor differences. Most notably, in the fallback the call-buffer is over-written in place, rather than appending to and clearing it every iteration. By my testing, this results in a performance increase of approximately 15%. Also, one iterable argument is required always, rather than checking and producing an error.

By moving this all into a template, we can swap out the aggregation logic, and use the same template for every mapping function:

(defmacro- map-aggregator
  `Aggregation logic for various map functions.`
  [maptype res val]
  (case maptype
    :map ~(array/push ,res ,val)
    :mapcat ~(array/concat ,res ,val)
    :keep ~(if (def y ,val) (array/push ,res y))
    :count ~(if ,val (++ ,res))
    :some ~(if (def y ,val) (do (set ,res y) (break)))
    :all ~(if (def y ,val) nil (do (set ,res y) (break)))))

(defmacro- map-n
  `Generates efficient map logic for a specific number of
  iterable beyond the first.`
  [n maptype res f ind inds]
  ~(do
    (def ,(seq [k :range [0 n]] (symbol 'ind k)) ,inds)
    ,;(seq [k :range [0 n]] ~(var ,(symbol 'key k) nil))
    (each x ,ind
      ,;(seq [k :range [0 n]]
        ~(if (= nil (set ,(symbol 'key k) (next ,(symbol 'ind k) ,(symbol 'key k)))) (break)))
      (map-aggregator ,maptype ,res (,f x ,;(seq [k :range [0 n]] ~(in ,(symbol 'ind k) ,(symbol 'key k))))))))

(defmacro- map-template
  `Template for functions which apply a mapping to multiple
  iterable arguments and aggregate the results.`
  [maptype res f ind inds]
  ~(do
    (def ninds (length ,inds))
    (case ninds
      0 (each x ,ind (map-aggregator ,maptype ,res (,f x)))
      1 (map-n 1 ,maptype ,res ,f ,ind ,inds)
      2 (map-n 2 ,maptype ,res ,f ,ind ,inds)
      3 (map-n 3 ,maptype ,res ,f ,ind ,inds)
      4 (map-n 4 ,maptype ,res ,f ,ind ,inds)
      (do
        (def iter-keys (array/new-filled ninds))
        (def call-buffer (array/new-filled ninds))
        (var done false)
        (each x ,ind
          (forv i 0 ninds
            (let [old-key (in iter-keys i)
                  ii (in ,inds i)
                  new-key (next ii old-key)]
              (if (= nil new-key)
                (do (set done true) (break))
                (do (set (iter-keys i) new-key) (set (call-buffer i) (in ii new-key))))))
          (if done (break))
          (map-aggregator ,maptype ,res (,f x ;call-buffer)))))))

Each of these six functions can then be defined in terms of this template:

(defn map
  [f ind & inds]
  (def res @[])
  (map-template :map res f ind inds)
  res)

(defn mapcat
  [f ind & inds]
  (def res @[])
  (map-template :mapcat res f ind inds)
  res)

(defn keep
  [pred ind & inds]
  (def res @[])
  (map-template :keep res pred ind inds)
  res)

(defn count
  [pred ind & inds]
  (var res 0)
  (map-template :count res pred ind inds)
  res)

(defn some
  [pred ind & inds]
  (var res nil)
  (map-template :some res pred ind inds)
  res)

(defn all
  [pred ind & inds]
  (var res true)
  (map-template :all res pred ind inds)
  res)

I believe that this change would make the language more expressive, and help to alleviate some of the awkwardness that arises when attempting to aggregate multiple iterables.