Open jwaldmann opened 8 years ago
I like this idea. Could you please write up a proposal and send it to libraries@haskell.org? On Apr 26, 2016 7:14 AM, "jwaldmann" notifications@github.com wrote:
Sometimes I want IntMap performance, but Enum k typing for more expressive source code.
This can be done with a heap of boilerplate code: https://gitlab.imn.htwk-leipzig.de/waldmann/pure-matchbox/blob/master/src/Matchbox/Map.hs
- and I hope that ghc will make sure it has no run-time cost.
Could (something like) this become part of containers?
Or does this have a totally different solution (with associated types perhaps) that should not be part of containers?
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/haskell/containers/issues/210
Will do.
I think I have defined such a map before, and I am sure many have.
But I am sceptical about the Enum
type class. It happens to have fromEnum
and toEnum
, but also many other functions that are unrelated to the task at hand. Furthermore, there are instances that a definitely not suitable to be used with such an IntLikeMap
, in particular Double
. I see two way out:
Enum a
to Coercible Int a
. This would give better performance guarantees, and ensure that no invalid instances are given, and users who have newtype Something = Something Int
could use the IntLikeMap
with Something
keys without additional boiler plate code.Enum a
with a new class IntLike a
with just methods toIntLike
and fromIntLike
, and specify that this class is meant for types than can injectively embedded into Int
. A bit more expressive and general, but requires more boiler plate code. There could be a default implementation in terms of Coercible
, so that for newtypes the instance is simple.Also note that in some applications one might only have a fooToInt
function for keys, and is content with using only the methods of Data.IntMap
that consume a key, but not functions like keys
or mapWithKey
. So maybe, the type class IntLike
should have a superclass ToInt
with the a -> Int
method, and the methods of IntMap
using only that functionality should have that constraint?
So the design space is not small. I would advocate putting this into a package of its own with a module name that does not clash with containers, and find good names and test and refine the API and get people to use it, with the plan to possibly merge it into containers
if it proves its merit.
I like your general IntLike class, but I think the method names are wrong for the same reason I dislike the equivalents in Enum. I'd much prefer toInt and fromInt. Another idea would be to split these:
class ToInt a where toInt :: a -> Int class FromInt a where fromInt :: Int -> a
One reason to split these is the single-method optimization; the other is that some types may only support one or the other.
What laws are you hoping for? Are Word8 and Bool Int-like? On May 1, 2016 3:17 PM, "Joachim Breitner" notifications@github.com wrote:
I think I have defined such a map before, and I am sure many have.
But I am sceptical about the Enum type class. It happens to have fromEnum and toEnum, but also many other functions that are unrelated to the task at hand. Furthermore, there are instances that a definitely not suitable to be used with such an IntLikeMap, in particular Double. I see two way out:
- Replace Enum a to Coercible Int a. This would give better performance guarantees, and ensure that no invalid instances are given, and users who have newtype Something = Something Int could use the IntLikeMap with Something keys without additional boiler plate code.
- Replace Enum a with a new class IntLike a with just methods toIntLike and fromIntLike, and specify that this class is meant for types than can injectively embedded into Int. A bit more expressive and general, but requires more boiler plate code. There could be a default implementation in terms of Coercible, so that for newtypes the instance is simple.
Also note that in some applications one might only have a fooToInt function for keys, and is content with using only the methods of Data.IntMap that consume a key, but not functions like keys or mapWithKey. So maybe, the type class IntLike should have a superclass ToInt with the a -> Int method, and the methods of IntMap using only that functionality should have that constraint?
So the design space is not small. I would advocate putting this into a package of its own with a module name that does not clash with containers, and find good names and test and refine the API and get people to use it, with the plan to possibly merge it into containers if it proves its merit.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/haskell/containers/issues/210#issuecomment-216065449
but I think the method names are wrong
I payed no attention to naming yet, so feel free to come up with better ones.
I’m not sure if FromInt
makes sense for types that have no ToInt
, and if it does not, then ToInt
should be a superclass of FromInt
.
The law for ToInt
would be that toInt
is an injective function, i.e. toInt a = toInt b ⟹ a = b
.
The law for FromInt
would be that fromInt (toInt a) = a
. I think it would be ok to have fromInt
only defined on the image of toInt
, at least for the use case we have in minds.
If using new classes in lieu of Enum
, another option to consider is the classes from prelude-safeenum. (I'm not saying they're necessarily the right idea here; just another option to consider.)
If all we're doing is converting to and from Int
, with injectivity and invertability as the only laws, then why not just use the HashMap
of unordered-containers
? The key reason to use IntMap
with Enum
(or similar) is that it maintains the ordering of elements; so we'd also want toInt
to preserve relative ordering of the type.
"preserve relative ordering" - Yes, that could be important for my application, as it might reduce runtime for intersections, because disjointness of maps can be detected early. There is no shortcut for intersecting hashmaps?
I don’t think that preserving relative ordering (with regard to the Ord
instance of the data type) is required for efficient intersection of disjoint IntMap
s. The only benefit is that toAscList
is as efficient as toList
; sometimes nice to have, but probably very important here.
Intersection worst case is at least linear in both inputs, even if they represent disjoint sets of keys.
Example: one map has [0,2 .. N] as keys, other map has [1,3 .. N+1] - you certainly need to visit both completely. (Well, unless the root branches on the last bit of the key, but then the example can be modified.)
If there is some order in the tree/trie structure, then we could get lucky in case one key set is [ .. N/2] and the other is [N/2+1 .. ], this can be detected by rightmost_key(a) < leftmost_key(b), and the Data.Map algorithm does this, cf. https://github.com/haskell/containers/issues/177, and I assume Data.IntMap does it as well.
For Hashmaps, there never is such a "lucky case"?
I now made the Map implementation switchable in my benchmark https://gitlab.imn.htwk-leipzig.de/waldmann/containers-benchmark/blob/master/src/Matchbox/Map.hs and it is definitely slower with HashMap, altough I have not profiled this in-depth (would need -auto-all on the libraries).
For the built-in test case (running main with no arguments)
Data.IntMap : 6.7 sec
Data.Map : 9.7 sec
Data.HashMap : 200 sec
I am definitely seeing M.intersectionWith
as a hotspot (https://gitlab.imn.htwk-leipzig.de/waldmann/containers-benchmark/blob/master/src/Matchbox/Relation.hs#L208)
Note that HashMap
does have different performance tradeoffs compared to IntMap
just due to the internal representation of the tree. Which is why I'm thinking of adding a non-hashed version of HashMap
to a future release (i.e., an AMT (array mapped trie) in lieu of the HAMT (hashed AMT)).
IntMap
s are binary trees internally, so they're cheap to build/change but require more pointer chasing to read from. Whereas HashMap
s use arrays internally to help flatten the tree, meaning they are far better behaved re cache and pointer chasing, but they're also more expensive to build/change.
Yes. I am partially aware of these tradeoffs. The text of your message should be atop the docs for containers.
I simply use the package: https://hackage.haskell.org/package/enummapset
I think that Enum
is the right abstraction. It is enough to define fromEnum
and toEnum
, the other functions are optimizations. instance Enum Double
is just wrong, because there is no sensible fromEnum
implementation.
I would agree that Enum
is most likely conceptually the right abstraction for this. I do however think Enum
is in a pretty bad place right now, with the Double
/Float
instances and the lack of an Ord
superclass, plus the inevitably partial succ
and pred
.
I would be much happier to use it if Haskell/base had a proper ordering hierarchy with Eq => Ord, Ord => Enum, Ord => Bounded
and a bunch of other classes like lattices and partial orderings.
Enum
can't decide whether it's for things that look sufficiently like Int
(e.g., Word8
), or for things supporting successors and predecessors (e.g., Integer
). I don't think it's the right way.
IMO Enum
should mean graded totally ordered set without the non-negative rank requirement.
So in other words it should be a bijection from the Integers, there should be super-classes that deal with just the graded part or just the pred/succ part.
If the above were the case then I think Enum
would make a lot of sense.
I realize that Integer and Int aren't the same, so maybe SmallEnum
or something similar would be needed that specifies that Int
is big enough for all values to be represented.
- Replace
Enum a
toCoercible Int a
.
This approach is being used in the wild at https://github.com/ejconlon/int-like/blob/master/src/IntLike/Map.hs
But I would love to see containers
' existing data IntMap a
migrated to data IntCoercibleMap i a
and type IntMap = IntCoercibleMap Int
. The containers
API is much richer than the one exposed by int-like
: more functions, and both strict and lazy versions. It'd be less maintenance overall to define everything once for IntCoercibleMap
than to maintain API parity with a newtype wrapper in a different package. And consumers could choose to write functions targeting IntCoercibleMap
and they'd be usable with IntMap
as well. I think the case is stronger for this than with the Enum
version of the proposal, as it's so much less intrusive to coerce
the specialized IntMap
functions to general IntCoercibleMap
ones than it is to thread fromEnum
and toEnum
everywhere. IntSet
too, mutatis mutandis.
I'm not holding my breath or anything; just saying I'd love to see it.
I wouldn't want that getting in the internals (with the possible exception of traversals). Would you be interested in giving it a go?
Either your first sentence is agreeing with me that the Coercible
approach is better than throwing the Enum
functions around, or you're offering a refinement of my proposal about which I need some more detail to properly understand. :smile: Either way, I'd be happy to contribute a PR that moves things forward if you think the idea is worth a shot.
There is https://hackage.haskell.org/package/enummapset, offering quite extensive API.
There is https://hackage.haskell.org/package/enummapset, offering quite extensive API.
That's what I said in 2016: https://github.com/haskell/containers/issues/210#issuecomment-228585953
And large as it is, it still suffers from not being in containers
.
‘This is a fully general argument against things not being in core libraries,’ you say? Well, maybe it is. But it's a much stronger argument IMO when the thing we're discussing being in core libraries is identical up to coercion to something that's already in core libraries.
Sometimes I want
IntMap
performance, butEnum k
typing for more expressive source code.This can be done with a heap of boilerplate code: https://gitlab.imn.htwk-leipzig.de/waldmann/pure-matchbox/blob/master/src/Matchbox/Map.hs - and I hope that ghc will make sure it has no run-time cost.
Could (something like) this become part of containers?
Or does this have a totally different solution (with associated types perhaps) that should not be part of containers?