Open pburka opened 7 years ago
That'd be nice. Wanna try a pull request? First we should fastutilize NavigableMap/Set tho (sigh).
Agreed. Without the NavigableMap interface it's not of much use to me. I'll see if I can make some progress on that.
We've got an implementation of this over in LensKit too, with few nice features:
long[]
storage is managed by a general KeyIndex
data structure that allows code reuse between sets and maps.long[]
and int[]
maps, so that if all keys of a map fit in integers, they are used to save space.Long2DoubleSortedArrayMap
s are immutable.We plan to soon relicense LensKit under the MIT license (just waiting for the last paperwork to clear), at which point our source code will be reusable in fastutil if desired.
I started working on this today by prototyping a NavigableSet template for fastutil. I ran into one complication: a number of NavigableSet methods can return null to indicate absence (e.g. ceiling(e) returns null if there is no element greater than or equal to e).
I believe the full set of affected methods is:
In Maps, you deal with this by adding a defaultReturnValue(). We could do the same thing in NavigableSet, but (a) it means adding additional state to each implementor, and (b) I'm not convinced that it's the best solution.
Since functions like floor() and ceiling() deal with inequalities, a user may, perhaps, prefer to receive MAX_VALUE as the 'absent' signal for one, and MIN_VALUE for the other.
How would you feel about a fastutil NavigableSet which supplemented Long ceiling(Long e) with long ceiling(long e, long ifNone), etc?
/peter
On Wed, Jun 21, 2017 at 10:55 AM Sebastiano Vigna notifications@github.com wrote:
That'd be nice. Wanna try a pull request? First we should fastutilize NavigableMap/Set tho (sigh).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vigna/fastutil/issues/73#issuecomment-310104899, or mute the thread https://github.com/notifications/unsubscribe-auth/ADxoy7_o03NnZdkaQwQfv8KYkmVTPh4jks5sGS7egaJpZM4OBCOq .
I fully understand your problem. A default return value is sensible for a non-sorted operation such as contains(), but it's utterly counterintuitive when order is involved. I think your proposal is reasonable, but let's season it for a couple of days. I'll think about it.
I've prototyped NavigableSet, NavigableMap, and some helpers in NavigableSets in a fork of the repository at https://github.com/pburka/fastutil/tree/master/ https://github.com/pburka/fastutil/tree/master/drv
I chose to append 'OrDefault' to each of the APIs which could return null, as this seemed consistent with the new Java 8 API, Map.getOrDefault().
Take a look and tell me what you think.
/peter
On Sun, Jun 25, 2017 at 2:36 PM Sebastiano Vigna notifications@github.com wrote:
I fully understand your problem. A default return value is sensible for a non-sorted operation such as contains(), but it's utterly counterintuitive when order is involved. I think your proposal is reasonable, but let's season it for a couple of days. I'll think about it.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vigna/fastutil/issues/73#issuecomment-310919802, or mute the thread https://github.com/notifications/unsubscribe-auth/ADxoy1xqxLBT-ezkAKryfMvE5fcBXYSDks5sHqi5gaJpZM4OBCOq .
On 29 Jun 2017, at 02:24, Peter Burka notifications@github.com wrote:
Take a look and tell me what you think.
I'm a little behind schedule with this, but I'm working on it :).
Ciao,
seba
What about NoSuchElementException
? This would nicely apply to all the relevant methods. For pollFirst
and pollLast
one can easily check whether the exception would be thrown, for the other methods (ceiling
, etc.) something like hasLower
and hasHigher
would be necessary. Those can be defaulted to .tailMap(e).isEmpty()
.
This approach somehow conflicts with the "default return value" idea, but I am unsure which of the two is more desirable for the typical use cases. To be honest I'd even like to have this "no such element"-behavior for normal maps since it makes debugging a lot easier in some cases. With the advent of getOrDefault
I'd even go as far as deprecating default return values completely, since the fastutil-get essentially only is a m.getOrDefault(k, m.defaultReturnValue())
, but that is a separate issue.
How would you feel about a fastutil NavigableSet which supplemented Long ceiling(Long e) with long ceiling(long e, long ifNone), etc?
I don't think that this would be used too frequently, but I am not entirely sure about that. If you want to check if the returned value is contained in the set you still have to call something like contains(long e)
(except there is some domain knowledge).
One more idea would be to borrow from Optional and have something like .applyCeiling(long e, LongConsumer consumer)
but I fear this might be more heavy-weight than just checking with contains(long e)
I'm not sold on throwing NoSuchElementException
. First, it's inconsistent with the object versions, which return null. Secondly, it's inefficient. Either you need to probe the map twice (once to check for presence, and again to retrieve the element), or you need to handle an exception, which is usually inefficient.
I think it's particularly awkward to throw exceptions for pollFirst()
and pollLast()
, since those methods exist specifically to avoid the exceptions of first()
and last()
.
I suspect that users do have domain knowledge in most cases, allowing them to choose values which are either known to be out-of-set, or which are appropriate for the none case. For example, it may make sense to call floorOrDefault(key, Long.MIN_VALUE)
.
I do agree that deprecating the universal default return values might be a good idea. I think that getOrDefault
is easier to reason about than a data structure which carries around its own default value.
First, it's inconsistent with the object versions, which return null.
Well, depending on how you look at it. ceiling(Long).longValue()
also throws an exception when there is no such value. Similarly, the default return value mechanism is inconsistent with the object-versions. It's more about coming up with the most "intuitive" inconsistency.
Secondly, it's inefficient. Either you need to probe the map twice (once to check for presence, and again to retrieve the element), or you need to handle an exception, which is usually inefficient.
I agree on this, exceptions definitely can't be used as control flow (there was some nice article on that somewhere providing some numbers). I assumed a different kind of "domain knowledge" - one may know that, e.g., the ceiling of some value always is present. At least in my case (dealing with normal maps) I always know which keys are present, so a fail-fast get
would be nicer.
I propose the following idea: We could just adapt both approaches, as in we have a ceilingChecked(K e)
(or a similar name) which throws a NoSuchElementException
if there is no ceiling of e
and a ceilingOrDefault(K e, K default)
which returns default
in that case. One could even add a default ceiling(K e)
which forwards to ceiling(e, MAX)
, where MAX
is the particular domain's maximal value (which luckily exists for all primitives ...). For float
s and double
s using NaN
might be an option, too, but that would get awkward I think.
I think it's particularly awkward to throw exceptions for
pollFirst()
andpollLast()
, since those methods exist specifically to avoid the exceptions offirst()
andlast()
.
Agree - I'm not familiar at all with NavigableSet
s so I don't have a "feeling" for the use cases. In this case, it might be sensible to just not add a primitive version of this method, since it always will be awkward: Either you know that there is a first element, then you can call the primitive first()
or you don't know and calling the pollFirst()
simply is the fastest way to get both the information whether the first exists and, if so, which value it is (in a sense, one can think of the returned Long
as OptionalLong
)
Design and implementation could probably draw a lot from Guava's ImmutableSortedMap
(see RegularImmutableSortedSet
), which is implemented using a simple sorted array.
But the problem here is mutability. The original post says, indeed, that they are loading the map in order. All strategies I can think for a sorted map based on arrays will have either linear search or linear update time (assuming you won't have space for additional data on top of the array).
I stumbled upon this issue because I have a case where I want to sort a "regular" fastutil's ArrayMap. It would be nice if fastutil's ArrayMaps had methods like sortByKeys() and sortByValues(). Then, there can be an adapter class or method "viewSortedArrayMapAsNavigableMap()" that returns an unmodifiable NavigableMap
. If the underlying ArrayMap is updated (e. g. a new entry is added), the view is no longer correct, but that's a responsibility of the user.
I have an application where we need many sorted maps of
long
s toObject
s, and we wish to use a very space efficient implementation. We use a custom map backed by a sortedlong[]
and a correspondingObject[]
. This is very similar to fastutil'sLong2ObjectArrayMap
, except that it provides aNavigableMap
interface. Entries are found using a simple binary search. In our expected use case, keys are added to the map in order, so we're not worried about the performance of adding or removing arbitrary keys.Would it make sense to add this functionality to fastutil? e.g.
Long2ObjectSortedArrayMap