Closed afanasy closed 3 years ago
@jgonggrijp Thank you for the detailed and prompt response. I agree that
var faster = isSorted && !isFunction(iteratee);
looks much better. However, I'm failing to understand the main point here - why is it a breaking change and how the _.property
iteratees are able to produce not "one-to-one" values, unsuitable for the fast algorithm? Could you provide an example? And just as a side note, introducing Set
will "dramatically reduce the performance difference between the fast and the slow algorithms" only for smaller arrays. For larger arrays the fast algorithm O(n)
will still be much faster than the slow one O(n^2)
.
Thank you for sharing your doubts.
(...) I'm failing to understand the main point here - why is it a breaking change and how the
_.property
iteratees are able to produce not "one-to-one" values, unsuitable for the fast algorithm? Could you provide an example?
I think this part of the unittests illustrates the one-to-one issue quite well:
The discrepancy arises because the criterion the array was originally sorted by (magnitude) is not the same criterion as the array is being uniqued by (magnitude of the square). You can get into a similar situation with _.property
iteratees as follows. Let's say the array was sorted by the rank
property and is now being uniqued by the flavor
property:
var theArray = [{rank: 1, flavor: 'vanilla'}, {rank: 2, flavor: 'strawberry'}, {rank: 3, flavor: 'vanilla'}];
var result = _.uniq(theArray, true, 'flavor');
My "acceptable change" number 3 above rests on the semantic change of expecting users to only pass isSorted === true
when the sort criterion and the unique criterion are the same. Your confusion may stem from the assumption that this is already the case, but it isn't.
And just as a side note, introducing
Set
will "dramatically reduce the performance difference between the fast and the slow algorithms" only for smaller arrays. For larger arrays the fast algorithmO(n)
will still be much faster than the slow oneO(n^2)
.
The standard dictates that Set
must provide a faster-than-linear lookup mechanism. Generally, that's either O(k) with small k for hash tables or O(log n) for binary search trees. The slow algorithm then becomes O(nk) or O(n log n) overall, which is not quite as fast as the fast algorithm yet, but way faster than the current slow algorithm, especially for large arrays.
@jgonggrijp This all makes sense, yes, I assumed users know what they are doing, when passing the isSorted === true
, but that may not always be the case. And you are right about Set
, it should provide the boost from using internal uniqueness check. Thank you for taking the time to clarify my misunderstanding.
The faster algorithm on sorted array will not work for
_.uniq
with an iteratee, if the iteratee is not a one-to-one function. Because of that it was disabled for all iteratees, even for those producing one-to-one functions, such asString
andArray
iteratees (handled byproperty()
function). This PR introduces the code that checks the iteratee type and enables faster algorithm if possible.