haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
316 stars 178 forks source link

Enum k => EnumMap k v as generatisation of IntMap v ? #210

Open jwaldmann opened 8 years ago

jwaldmann commented 8 years ago

Sometimes I want IntMap performance, but Enum k typing for more expressive source code.

This can be done with a heap of boilerplate code: https://gitlab.imn.htwk-leipzig.de/waldmann/pure-matchbox/blob/master/src/Matchbox/Map.hs - and I hope that ghc will make sure it has no run-time cost.

Could (something like) this become part of containers?

Or does this have a totally different solution (with associated types perhaps) that should not be part of containers?

treeowl commented 8 years ago

I like this idea. Could you please write up a proposal and send it to libraries@haskell.org? On Apr 26, 2016 7:14 AM, "jwaldmann" notifications@github.com wrote:

Sometimes I want IntMap performance, but Enum k typing for more expressive source code.

This can be done with a heap of boilerplate code: https://gitlab.imn.htwk-leipzig.de/waldmann/pure-matchbox/blob/master/src/Matchbox/Map.hs

  • and I hope that ghc will make sure it has no run-time cost.

Could (something like) this become part of containers?

Or does this have a totally different solution (with associated types perhaps) that should not be part of containers?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/haskell/containers/issues/210

jwaldmann commented 8 years ago

Will do.

nomeata commented 8 years ago

I think I have defined such a map before, and I am sure many have.

But I am sceptical about the Enum type class. It happens to have fromEnum and toEnum, but also many other functions that are unrelated to the task at hand. Furthermore, there are instances that a definitely not suitable to be used with such an IntLikeMap, in particular Double. I see two way out:

Also note that in some applications one might only have a fooToInt function for keys, and is content with using only the methods of Data.IntMap that consume a key, but not functions like keys or mapWithKey. So maybe, the type class IntLike should have a superclass ToInt with the a -> Int method, and the methods of IntMap using only that functionality should have that constraint?

So the design space is not small. I would advocate putting this into a package of its own with a module name that does not clash with containers, and find good names and test and refine the API and get people to use it, with the plan to possibly merge it into containers if it proves its merit.

treeowl commented 8 years ago

I like your general IntLike class, but I think the method names are wrong for the same reason I dislike the equivalents in Enum. I'd much prefer toInt and fromInt. Another idea would be to split these:

class ToInt a where toInt :: a -> Int class FromInt a where fromInt :: Int -> a

One reason to split these is the single-method optimization; the other is that some types may only support one or the other.

What laws are you hoping for? Are Word8 and Bool Int-like? On May 1, 2016 3:17 PM, "Joachim Breitner" notifications@github.com wrote:

I think I have defined such a map before, and I am sure many have.

But I am sceptical about the Enum type class. It happens to have fromEnum and toEnum, but also many other functions that are unrelated to the task at hand. Furthermore, there are instances that a definitely not suitable to be used with such an IntLikeMap, in particular Double. I see two way out:

  • Replace Enum a to Coercible Int a. This would give better performance guarantees, and ensure that no invalid instances are given, and users who have newtype Something = Something Int could use the IntLikeMap with Something keys without additional boiler plate code.
  • Replace Enum a with a new class IntLike a with just methods toIntLike and fromIntLike, and specify that this class is meant for types than can injectively embedded into Int. A bit more expressive and general, but requires more boiler plate code. There could be a default implementation in terms of Coercible, so that for newtypes the instance is simple.

Also note that in some applications one might only have a fooToInt function for keys, and is content with using only the methods of Data.IntMap that consume a key, but not functions like keys or mapWithKey. So maybe, the type class IntLike should have a superclass ToInt with the a -> Int method, and the methods of IntMap using only that functionality should have that constraint?

So the design space is not small. I would advocate putting this into a package of its own with a module name that does not clash with containers, and find good names and test and refine the API and get people to use it, with the plan to possibly merge it into containers if it proves its merit.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/haskell/containers/issues/210#issuecomment-216065449

nomeata commented 8 years ago

but I think the method names are wrong

I payed no attention to naming yet, so feel free to come up with better ones.

I’m not sure if FromInt makes sense for types that have no ToInt, and if it does not, then ToInt should be a superclass of FromInt.

The law for ToInt would be that toInt is an injective function, i.e. toInt a = toInt b ⟹ a = b.

The law for FromInt would be that fromInt (toInt a) = a. I think it would be ok to have fromInt only defined on the image of toInt, at least for the use case we have in minds.

wrengr commented 8 years ago

If using new classes in lieu of Enum, another option to consider is the classes from prelude-safeenum. (I'm not saying they're necessarily the right idea here; just another option to consider.)

If all we're doing is converting to and from Int, with injectivity and invertability as the only laws, then why not just use the HashMap of unordered-containers? The key reason to use IntMap with Enum (or similar) is that it maintains the ordering of elements; so we'd also want toInt to preserve relative ordering of the type.

jwaldmann commented 8 years ago

"preserve relative ordering" - Yes, that could be important for my application, as it might reduce runtime for intersections, because disjointness of maps can be detected early. There is no shortcut for intersecting hashmaps?

nomeata commented 8 years ago

I don’t think that preserving relative ordering (with regard to the Ord instance of the data type) is required for efficient intersection of disjoint IntMaps. The only benefit is that toAscList is as efficient as toList; sometimes nice to have, but probably very important here.

jwaldmann commented 8 years ago

Intersection worst case is at least linear in both inputs, even if they represent disjoint sets of keys.

Example: one map has [0,2 .. N] as keys, other map has [1,3 .. N+1] - you certainly need to visit both completely. (Well, unless the root branches on the last bit of the key, but then the example can be modified.)

If there is some order in the tree/trie structure, then we could get lucky in case one key set is [ .. N/2] and the other is [N/2+1 .. ], this can be detected by rightmost_key(a) < leftmost_key(b), and the Data.Map algorithm does this, cf. https://github.com/haskell/containers/issues/177, and I assume Data.IntMap does it as well.

For Hashmaps, there never is such a "lucky case"?

I now made the Map implementation switchable in my benchmark https://gitlab.imn.htwk-leipzig.de/waldmann/containers-benchmark/blob/master/src/Matchbox/Map.hs and it is definitely slower with HashMap, altough I have not profiled this in-depth (would need -auto-all on the libraries).

For the built-in test case (running main with no arguments)

Data.IntMap  :   6.7 sec
Data.Map     :   9.7 sec
Data.HashMap : 200 sec

I am definitely seeing M.intersectionWith as a hotspot (https://gitlab.imn.htwk-leipzig.de/waldmann/containers-benchmark/blob/master/src/Matchbox/Relation.hs#L208)

wrengr commented 8 years ago

Note that HashMap does have different performance tradeoffs compared to IntMap just due to the internal representation of the tree. Which is why I'm thinking of adding a non-hashed version of HashMap to a future release (i.e., an AMT (array mapped trie) in lieu of the HAMT (hashed AMT)).

IntMaps are binary trees internally, so they're cheap to build/change but require more pointer chasing to read from. Whereas HashMaps use arrays internally to help flatten the tree, meaning they are far better behaved re cache and pointer chasing, but they're also more expensive to build/change.

jwaldmann commented 8 years ago

Yes. I am partially aware of these tradeoffs. The text of your message should be atop the docs for containers.

amigalemming commented 8 years ago

I simply use the package: https://hackage.haskell.org/package/enummapset

amigalemming commented 8 years ago

I think that Enum is the right abstraction. It is enough to define fromEnum and toEnum, the other functions are optimizations. instance Enum Double is just wrong, because there is no sensible fromEnum implementation.

tysonzero commented 5 years ago

I would agree that Enum is most likely conceptually the right abstraction for this. I do however think Enum is in a pretty bad place right now, with the Double/Float instances and the lack of an Ord superclass, plus the inevitably partial succ and pred.

I would be much happier to use it if Haskell/base had a proper ordering hierarchy with Eq => Ord, Ord => Enum, Ord => Bounded and a bunch of other classes like lattices and partial orderings.

treeowl commented 5 years ago

Enum can't decide whether it's for things that look sufficiently like Int (e.g., Word8), or for things supporting successors and predecessors (e.g., Integer). I don't think it's the right way.

tysonzero commented 5 years ago

IMO Enum should mean graded totally ordered set without the non-negative rank requirement.

So in other words it should be a bijection from the Integers, there should be super-classes that deal with just the graded part or just the pred/succ part.

If the above were the case then I think Enum would make a lot of sense.

I realize that Integer and Int aren't the same, so maybe SmallEnum or something similar would be needed that specifies that Int is big enough for all values to be represented.

rhendric commented 1 year ago
  • Replace Enum a to Coercible Int a.

This approach is being used in the wild at https://github.com/ejconlon/int-like/blob/master/src/IntLike/Map.hs

But I would love to see containers' existing data IntMap a migrated to data IntCoercibleMap i a and type IntMap = IntCoercibleMap Int. The containers API is much richer than the one exposed by int-like: more functions, and both strict and lazy versions. It'd be less maintenance overall to define everything once for IntCoercibleMap than to maintain API parity with a newtype wrapper in a different package. And consumers could choose to write functions targeting IntCoercibleMap and they'd be usable with IntMap as well. I think the case is stronger for this than with the Enum version of the proposal, as it's so much less intrusive to coerce the specialized IntMap functions to general IntCoercibleMap ones than it is to thread fromEnum and toEnum everywhere. IntSet too, mutatis mutandis.

I'm not holding my breath or anything; just saying I'd love to see it.

treeowl commented 1 year ago

I wouldn't want that getting in the internals (with the possible exception of traversals). Would you be interested in giving it a go?

rhendric commented 1 year ago

Either your first sentence is agreeing with me that the Coercible approach is better than throwing the Enum functions around, or you're offering a refinement of my proposal about which I need some more detail to properly understand. :smile: Either way, I'd be happy to contribute a PR that moves things forward if you think the idea is worth a shot.

Bodigrim commented 1 year ago

There is https://hackage.haskell.org/package/enummapset, offering quite extensive API.

amigalemming commented 1 year ago

There is https://hackage.haskell.org/package/enummapset, offering quite extensive API.

That's what I said in 2016: https://github.com/haskell/containers/issues/210#issuecomment-228585953

rhendric commented 1 year ago

And large as it is, it still suffers from not being in containers.

‘This is a fully general argument against things not being in core libraries,’ you say? Well, maybe it is. But it's a much stronger argument IMO when the thing we're discussing being in core libraries is identical up to coercion to something that's already in core libraries.