johnnycdu / google-collections

Automatically exported from code.google.com/p/google-collections
Apache License 2.0
0 stars 0 forks source link

Req: Abstraction of MultiMap to handle arbitrary op and basecase (identity-under-op) #41

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I frequently need to find the max / total / avg value for each key, across
a set of key / value pairs.  Here is a static method to accomplish this for
max:

    public static void mapPutMax(HashMap<String, Integer> map, String key, int
newVal) {
        int max = newVal;
        if (map.containsKey(key))
            max = Math.max(map.get(key), max);
        map.put(key, max);
    }

In the same way that a MultiMap greatly facilitates a pattern similar to
the above when adding an entry to a Map<String, List> (i.e. creating a new
empty ArrayList if the key doesn't yet exist in the map, else appending to
the existing list), it would be great if there were an abstraction of
MultiMap, called say MapBuilder, which allowed you to define an add
operation and an identity under the operation when no value yet exists for
a given key.  Something like the following:

    public class MapBuilder<K, V, Velem> {
        // Can't just extend HashMap<K, V> because we need put(K, Velem) not
put(K, V)
        private HashMap<K, V> underlyingMap = new HashMap<K, V>();
        private MapBuilderOp<V, Velem> op;

        public MapBuilder() {
            throw new IllegalArgumentException("Cannot use default constructor");
        }

        public MapBuilder(MapBuilderOp<V, Velem> op) {
            this.op = op;
        }

        public V getValOrIdentity(K key) {
            return (underlyingMap.containsKey(key) ? underlyingMap.get(key) :
this.op.newIdentityUnderOp());
        }

        public V get(K key) {
            return underlyingMap.get(key);
        }

        public V put(K key, Velem val) {
            V newVal = this.op.applyLeftAssoc(getValOrIdentity(key), val);
            return underlyingMap.put(key, newVal);
        }

        public Set<K> keySet() {
            return underlyingMap.keySet();
        }
    }

This would be used as follows: 

    MapBuilder<String, Integer, Integer> map1 = new Main().new
MapBuilder<String, Integer, Integer>(new MapBuilderOp<Integer, Integer>() {
        @Override
        public Integer applyLeftAssoc(Integer oldVal, Integer newVal) {
            return oldVal * newVal;
        }
        @Override
        public Integer newIdentityUnderOp() {
            return 1;
        }
    });
    map1.put("a", 2);
    map1.put("b", 2);
    map1.put("a", 10);
    map1.put("b", 1);
    map1.put("a", 7);
    for (String k : map1.keySet())
        System.out.println(k + " " + map1.get(k));

    MapBuilder<String, ArrayList<Integer>, Integer> map2 = new Main().new
MapBuilder<String, ArrayList<Integer>, Integer>(new
MapBuilderOp<ArrayList<Integer>, Integer>() {
        @Override
        public ArrayList<Integer> applyLeftAssoc(ArrayList<Integer> oldList,
Integer newVal) {
            // This is a problem, we should really just return a copy of oldList
with newVal appended, but
            // doing this in-place is much faster.  Perhaps implement different
method for in-place
            // modification?
            oldList.add(newVal);
            return oldList;
        }
        @Override
        public ArrayList<Integer> newIdentityUnderOp() {
            return new ArrayList<Integer>();
        }
    });

    map2.put("a", 2);
    map2.put("b", 2);
    map2.put("a", 10);
    map2.put("b", 1);
    map2.put("a", 7);
    for (String k : map2.keySet())
        System.out.println(k + ": " + Join.join(", ", map2.get(k)));

(There could also be a version of MapBuilder with type <K, V>, for when V
== Velem, that will simplify usage in the case of map1 above.)

You can see that in the case of map2, we actually have an implementation of
a MultiMap; in the case of map1, we can use different identities and
operations according to what sort of math we are applying, for example:

ident   op   action
  1     *    prod
  0     +    sum
 null   fn    max   , where fn = (oldVal == null ? newVal :
Math.max(oldVal, newVal)

This therefore represents a useful abstraction of MultiMap to a much
broader range of use cases.

Thanks!

Original issue reported on code.google.com by luke.hutch on 28 Nov 2007 at 11:05

GoogleCodeExporter commented 9 years ago
Oops, forgot the interface:

    public interface MapBuilderOp<V, Velem> {
        /** Left-associative application of this operation */
        public V applyLeftAssoc(V lVal, Velem rVal);
        /** Create a new identity object */
        public V newIdentityUnderOp();
    }

Original comment by luke.hutch on 28 Nov 2007 at 11:06

GoogleCodeExporter commented 9 years ago
PS I should say that applyLeftAssoc is really an example of foldl, for those 
with
Haskell experience :-)  It might make sense to look extensively at Haskell's 
prelude,
and cover foldr, the various zip* functions, 'map', etc. -- as inevitably the 
need
will come up for these other routines, but they are currently being 
hacked-around on
an as-needed basis by Java programmers -- just as Map<K, ? extends Collection> 
was
hacked around prior to the Google Collections Library's existence.

Original comment by luke.hutch on 28 Nov 2007 at 11:21

GoogleCodeExporter commented 9 years ago
My first instinct is that if you had a Map whose get() method auto-creates a 
"zero
value" (using your supplied creation function) whenever the value is missing, 
you'd
have everything you need, don't you think?  In your example, you wouldn't use 
Integer
as your value type, but some custom class of yours that has the methods you 
need like
setIfGreater().  Now, probably I'm wrong about what you're really trying to do, 
but
please explain why - thanks.

It's interesting that Java's ThreadLocal bothered to support this 
"autovivification"
but no collections do.

Original comment by kevin...@gmail.com on 28 Nov 2007 at 11:51

GoogleCodeExporter commented 9 years ago
No, the point is that you often need to perform an arbitrary operation on the
existing value in a map, if present, or do something else if not present.  
MultiMaps
create new lists if no value present for the key, and store the new value in 
the new
list, else append to the old list at that key.

The first code example I gave does something very similar, but with the max 
operation
-- and yet MultiMaps can't compute the max of values with a given key, they are 
too
specialized to work on lists.

The MapBuilder code I showed can do both MultiMaps and compute max, as well as
product, sum, avg, anything else for which you can define an operation, and an
identity for the operation.  Therefore MapBuilder is an abstraction of 
MultiMaps, and
is more powerful than MultiMaps (it can be used to simply implement MultiMaps 
as well
as many other things).

In general this concept is known as foldl in the functional programming world, 
and
once you realize that MultiMaps are in some sense a foldl after grouping k/v 
pairs by
key, then it follows that there may be some other really powerful utility 
classes
created, by looking at other functional programming paradigms, that extend
collections in ways as powerful as the leap from Map->MultiMap.

Just defining a get() that creates a zero-value rather than returning null if 
there
is no value for the key is not a solution to the bigger picture.

Original comment by luke.hutch on 29 Nov 2007 at 12:08

GoogleCodeExporter commented 9 years ago
I'm not sure why I should prefer your

mapBuilder.put("a", 2)

over my

map.get("a").performSomeOperation(2)

To me, the second is far more simple and clear, requiring only something so 
simple as
an "autovivifying map".

What am I missing?

Original comment by kevin...@gmail.com on 29 Nov 2007 at 12:21

GoogleCodeExporter commented 9 years ago
Also, I don't see Multimap as a special case of what you describe at all. A 
Multimap
is  just a collection; a bunch of key-value pairs, just like a Map but without 
the
constraint that two entries can't have the same key.  And it has some useful 
view
collections, just as Map does.

What you're talking about is completely different; it only seems similar to a
Multimap if you're thinking in terms of how Multimaps are *implemented*.

Original comment by kevin...@gmail.com on 29 Nov 2007 at 12:32

GoogleCodeExporter commented 9 years ago
Your example works for mutable objects, so if map is an autovivifying map from 
string
to list, you can do

map.get("a").add("new val")

However if these are immutable objects, then you need to do something like the
following, e.g. for a map from string to integer:

map.put("a", Math.max(map.get("a"), newVal))

What I am proposing is something like

map = MapBuilder<Integer>.createMaxMap();  // Where this is defined similarly 
to map1
above
map.put("a", 10);
map.put("a", 1); // etc.

I think this notation is a lot clearer -- you just create a map whose put 
operation
applies a binary op, and uses the identity if only one operand is available -- 
but
the reason why I suggest it is because it will cut down on an enormous amount of
boilerplate code that I have been writing every time I want to use a map with a
non-trivial valuespace, or where I want to do more than just .add() in the
traditional sense.

Original comment by luke.hutch on 29 Nov 2007 at 12:33

GoogleCodeExporter commented 9 years ago
...and the only win you get from using autovivifying maps for the example 

map.put("a", Math.max(map.get("a"), newVal))

is avoiding one test:

val = map.get("a");
map.put("a", Math.max(val == null ? identity : val, newVal))

This is still helpful, and enough of a reason to have an autovivifying get(), 
but the
real win is avoiding the sort of boilerplate I was talking about originally:

    public static void mapPutMax(HashMap<String, Integer> map, String key, int
newVal) {
        int max = newVal;
        if (map.containsKey(key))
            max = Math.max(map.get(key), max);
        map.put(key, max);
    }

Here's the ArrayList version for comparison -- this pattern is all over my code 
and
other people's, and it's tedious to write all the time:

    public static void mapPutMax(HashMap<String, ArrayList<Integer>> map, String key,
int newVal) {
        ArrayList<Integer> list = map.get(key);
        if (list == null)
            list = new ArrayList<Integer>();
        list.add(newVal);
        map.put(key, list);
    }

Avoiding this second code snippet is (if I understand right) one of the very 
reasons
MultiMaps exist.  However this follows the same sort of pattern as the first 
snippet
-- suggesting that there is a useful generalization here.  Basic group theory
suggests the answer lies in realizing that both operations have in common an
associative binary operation and an identity under that operation.

Original comment by luke.hutch on 29 Nov 2007 at 12:40

GoogleCodeExporter commented 9 years ago
Just saw comment #6.  I understand your abstract definition of MultiMaps, and 
that
they cover a bigger picture than this by providing view collections etc.  This 
is not
however so much about how MultiMaps are implemented as it is about how to get 
values
into them in a convenient way... I'm suggesting an aspect of the interface, not 
the
implementation.  This is completely backwards-compatible with the current 
interface,
and just provides a unification between maps and multimaps.  It also makes
value/value maps as useful as multimaps for cases where you want to do more 
than just
put/overwrite.

What I'm suggesting would probably fit better in the interface of Map itself -- 
in
fact you could define the standard Map.put(k, v) semantics (putting or 
overwriting
depending on whether a value is already present) using this same MapBuilder 
model, by
defining the identity as null, and the operation to ignore the old value, and 
put the
new value.

Original comment by luke.hutch on 29 Nov 2007 at 12:52

GoogleCodeExporter commented 9 years ago
"Your example works for mutable objects... however if these are immutable 
objects..."

Then why use the immutable objects, if by switching to mutable ones that you 
craft
you can avoid all of this "MapBuilder" complexity?  For example this is what I 
do in
HashMultiset.

I'm sorry but I don't think that something like your MapBuilder belongs in this 
library.

Original comment by kevin...@gmail.com on 29 Nov 2007 at 1:10

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 29 Nov 2007 at 1:13