Closed Marcono1234 closed 2 years ago
I think this use case is quite special, the efficiency of this relies on several assumptions: You both need to find a mapping of your real domain into [0, n] (where n is reasonably small, but not too small, otherwise enum maps etc. are more efficient) and that [0, n] should be filled quite densely (otherwise a int2int hash map would be more efficient).
I can see the usecase for mapping e.g. an alphabet (on the other hand, not sure if you would need that there, since you could directly map it with an array). However, I think this is too specialized to be contained in a general purpose library, in particular the implementation simply is not useable at all for anything else - its not that it would be slow, it would simply break (this does not mean that it is a bad thing to do, its simply a specialized implementation!). So you can't really use it as a generic implementation of a Map. And, if I understand correctly, the main point of the implementation is to obtain space savings, so this would likely only pay off if you have thousands of these maps, all satisfying the assumptions (if the data is sparse, you could use hash maps, if the data is completely dense, you could use an array).
One intermediate solution which is a bit less efficient given the assumptions but works in general would maybe be to implement an X2YMap
which takes X2ZFunction
(together with an inverse) and Z2YMap
and transparently remaps keys. Combine this with my Nat2IntArrayMap and you pretty much get your behaviour (except bit set compression but no second array at all).
BTW: java.util.Bitset
could be helpful for implementation, especially for writing iterators :-)
Thanks for your extensive feedback!
otherwise enum maps etc. are more efficient
The proposed implementation could effectively act as enum map, its implemention is pretty similar to the standard Java EnumMap
which also uses a single pre-sized array for the values; both have O(1) lookup time. The only overhead the proposed solution adds comes from the indirection through the KeyIndexer
, but maybe that is negligible (has to be tested).
since you could directly map it with an array
Manually maintaining an array would work, but you would have to repeat the boilerplate code for conversion from key to index and vice versa. Additionally you would have to write all methods (such as containsKey
) yourself.
it would simply break
You are right, that is the main issue. If the map is created for enum constants [A, B, C] and you try to put a value for D it would fail.
BTW:
java.util.Bitset
could be helpful for implementation, especially for writing iterators :-)
BitSet
contains logic for resizing which is not necessary here, therefore I went for a manual bit set. But that is probably premature optimization; using BitSet
would work here as well and would indeed simplify some tasks.
its simply a specialized implementation
I think this summarizes it pretty well. For my use case I even went for a more specialized (and simplified) implementation similar to your Nat2IntArrayMap
, but I thought a more general solution (as proposed here) might be helpful.
If you don't mind, I would leave this issue open to see if anyone else is interested in this. But I understand that such a map might be too specific to be provided by such a general purpose library.
Thanks for your extensive feedback!
Thanks for the extensive suggestion ;)
The proposed implementation could effectively act as enum map, its implemention is pretty similar to the standard Java EnumMap which also uses a single pre-sized array for the values; both have O(1) lookup time. The only overhead the proposed solution adds comes from the indirection through the KeyIndexer, but maybe that is negligible (has to be tested).
Indeed, the difference is negligible. I meant that in case the data isn't sparse (e.g. you only consider a subset of the complete enum set), you'd be saving some space (since the value array isn't as large). But this difference only would matter if data is really sparse and you'd have a ton of these maps. Conclusion: The difference is small so there isn't a need except for this (very specific) case.
Manually maintaining an array would work, but you would have to repeat the boilerplate code for conversion from key to index and vice versa. Additionally you would have to write all methods (such as containsKey) yourself.
I meant having an implementation such as the linked Nat2IntArrayMap
(compared to having a generic key to index mapping).
My point is (summarizing the above two replies) that the use cases where having an index mapping + bit set is superior to both (i) hashing and (ii) directly using the mapped index as key is (I think) very rare.
Correct me if I am wrong: As I see it, the efficiency of this approach rests upon the assumption that the set of keys (or rather: their mapped indices) your application encounters is (i) significantly smaller than the domain of the keys and (ii) known a-priori. If (i) is not true, directly mapping in an array would be faster, if (ii) is not true, a hash-based approach is faster.
BitSet contains logic for resizing which is not necessary here, therefore I went for a manual bit set. But that is probably premature optimization; using BitSet would work here as well and would indeed simplify some tasks.
Agree with both points. What I was aiming at is that BitSet has a non-trivial nextSetBit
function, which would be useful for iterators.
but I thought a more general solution (as proposed here) might be helpful.
I see your point, I just do not think there is a common application where this might be the canonical solution. Generality often comes at a cost, even if it is hidden sometimes :-)
But definitely, I agree, if more people are interested my main concern directly is invalidated and it might make sense to include this.
What do you think about
Object2PrimitiveMap
classes which are array based and directly allow the user to specify a mapping function / an indexer to calculate key indices. The use case for such a map would be:This might sound somewhat similar to the existing
...CustomHashMap
classes with the difference that it can be implemented using a single primitive array for value storage and a bit vector for recording whether an entry is present compared to the two arrays of the hash map (and excess space due to load factor).Such a map would be well suited for enums (their
ordinal()
value can be used as index) (see also #148), but also easily allows subset of enum constants or other custom singleton data types. In my concrete use case I implemented something similar to this to create maps ofCharacter.UnicodeScript
, which has 157 constants, but where the application only uses 18 of them.The 'indexer' used to map key to index can be shared by multiple maps, this avoids unnecessary enum
values()
array copies for each map instance.However, I am also mainly interested in your feedback regarding this idea. I am not really familiar with efficient primitive data structures and am therefore not sure how good performance of such a data structure would be or how well the JVM would handle it, you probably have more experience in this area.
Below is a sample implementation, note however that I am not planning to submit a pull request: (not fully tested, but hopefully you get the idea)
Usage example: