hpcugent / vsc-base

Basic Python libraries used by UGent's HPC group
Other
14 stars 51 forks source link

add MonoidConcat class #330

Closed lexming closed 2 years ago

lexming commented 2 years ago

This PR adds a variation of the Monoid class specifically optimized to concatenate lists.

Monoid is currently used by vsc-filesystems to concatenate lists of filesystem/quota attributes and it is painfully slow. To the point where some scripts such as dquota become unusable.

The issue is that the reduce method used in Monoid does not scale. The table below shows the time needed to inject into a MonoidDict an increasing amount of lists with the current mappend used in vsc-filesystems, some other alternative mappends and the new MonoidConcat class (without reduce):

  1. Monoid with current mappend used by vsc-filesystems:
    l = lambda xs, ys: xs + ys
  2. Monoid with alternative mappend with list.extend:
    def l(xs, ys):
        new_xs = list(xs)
        new_xs.extend(ys)
        return new_xs
  3. Monoid with alternative mappend with list.append:
    def l(xs, ys):
        zs = []
        list(map(zs.append, xs))
        list(map(zs.append, ys))
        return zs
  4. Monoid with alternative mappend with itertools.chain:
     l = lambda xs, ys: list(itertools.chain(xs, ys))
  5. MonoidConcat without reduce
Nr of records 1: a + b (s) 2: extend (s) 3: append (s) 4: chain (s) 5: MonoidConcat (s)
100 0.0227 0.0469 0.0512 0.0502 0.0155
1000 0.3881 0.4753 1.9757 0.9320 0.1381
10000 30.4001 29.3562 171.8019 63.3293 1.3504
20000 122.6388 116.9318 -- -- 2.3829
lexming commented 2 years ago

Not sure what happened with Jenkins, but running tox on my side works fine: https://gist.github.com/53be3bff4225c96598eb2ddcfe3889a7

stdweird commented 2 years ago

what is the full code for your timings? in particular why not use list(itertools.chain(*list_of_all_lists)) instead of using monid at all?

lexming commented 2 years ago

Superseded by https://github.com/hpcugent/vsc-filesystems/pull/84