jashkenas / underscore

JavaScript's utility _ belt
https://underscorejs.org
MIT License
27.3k stars 5.53k forks source link

array uniq unexpected results with sparse arrays #484

Closed ajax333221 closed 12 years ago

ajax333221 commented 12 years ago
        var arr=[];
        arr[2]=null;
        arr[3]=null;
        arr[8]=2;
        arr[10]=2;
        arr[11]=5;
        arr[14]=5;
        arr[16]=8;
        arr[19]=8;
        arr[26]=undefined;
        arr[29]=undefined;
        arr[33]="hi";

        alert(_.uniq(arr,true)); //outputs to "2,5,8,,hi"

Problem?: The first 'null' value from the beginning was removed

Notes: This also happens if you start with 'undefined' values (and probably more)

davidarkemp commented 12 years ago

Although you're passing the sorted flag for an array that's not sorted, there's clearly a problem, as even _.uniq(arr.sort(), true) produces the wrong result.

so does the following:

>       var arr=[];
        arr[2]=null;
        arr[3]=null;
        arr[8]=2;
        arr[10]=2;
        arr[12]="2";
        arr[11]=5;
        arr[14]=5;
        arr[16]=8;
        arr[19]=8;
        arr[26]=undefined;
        arr[29]=undefined;
        arr[33]="hi";

> _.uniq(arr.sort(), true) 
["2", 5, 8, "hi", null]

this seems to be because it's using the wrong comparison operator on line 377

ghost commented 12 years ago

@davidarkemp: According to section 15.4.4.11 of the ES 5.1 spec, the sorting behavior for sparse arrays is implementation-dependent, so it's quite possible (and permissible) that arr.sort() will produce inconsistent results. Furthermore, IE <= 8 treats undefined elements in arrays as elisions; as such, 26 in arr == false && 29 in arr == false. It's as if those values were never initialized: try executing _.uniq(arr) in IE <= 8 without passing the isSorted flag, and you'll see that the result is [null, 2, 5, 8, "hi"]; it should be [null, 2, 5, 8, undefined, "hi"].

@ajax333221: null is not included in the results because the sorted _.uniq algorithm does not expect to be invoked on sparse arrays. Let's walk through the first five steps of what will occur after _.uniq(arr, true) is called:

  1. Create an empty memo array (via _.reduce) to store the previously-enumerated values, and an empty result array to hold the unique values.
  2. The condition 0 == i is false, as arr is sparse and does not contain a value at i == 0. Continue the reduction.
  3. The first value of arr is initialized at i == 2 (hence, 0 == i is false).
  4. isSorted === true, so the comparison _.last(memo) != el is performed. You'd expect the result of this comparison to be true; it's actually false. memo is empty, so _.last(memo) == undefined, but arr[2] == null. undefined == null, so undefined != null is false.
  5. Since Result(4) is false, neither memo nor result are updated.

This is why result will be [2, 5, 8, undefined, "hi"] in all browsers except IE <= 8. The latter is subject to the sparse array bug that I mentioned above, so the (egregiously incorrect) result will be [2, 5, 8, "hi"].

The solution, unfortunately, is to avoid sparse arrays and arrays containing undefined values entirely...even if you don't pass the isSorted flag (_.uniq(arr)), you'll still receive inconsistent results in IE.

ajax333221 commented 12 years ago

I have created my own delDups, and it is notably faster than the underscore uniq()

http://jsperf.com/undersc-uniq

And best of all, it works like I expect...

I think they should consider rebuilding their function.

I am a newbie at JS, I have less than a year using it. So I am sure my function can be improved even more:

    function delDups(arr,bol){
        var i=0,ien=arr.length,prv,rtn=[];
        if(ien>0){
            if(ien<3||bol){
                for(;i<ien;i++){
                    if(i in arr){
                        if(arr[i]!==prv||rtn.length==0){
                            rtn.push(arr[i]);
                            prv=arr[i];
                        }
                    }
                }
            }else{
                for(;i<ien;i++){
                    if(i in arr&&!inArray(arr[i],rtn,true)){
                        rtn.push(arr[i]);
                    }
                }
            }
        }
        return rtn;
    }

The older function I had was this:

    function delDups(arr,bol){
        var i=0,ien=arr.length,rtn=[];
        if(ien>0){
            if(ien<3||bol){
                for(ien--;i<ien;i++){
                    if(arr[i]!==arr[i+1]){
                        rtn.push(arr[i]);
                    }
                }
                rtn.push(arr[i]);
            }else{
                rtn.push(arr[i]);
                for(i=1;i<ien;i++){
                    if(!inArray(arr[i],rtn,true)){
                        rtn.push(arr[i]);
                    }
                }
            }
        }
        return rtn;
    }

The second is like 1/3 faster, but it will fail on sparse arrays.

So you guys are free to use my functions, I am donating them to underscore because I love underscore

PS: if(!inArray(arr[i],rtn,true)) is exactly the same as rtn.indexOf(arr[i])==-1

ghost commented 12 years ago

Okay, I've submitted a pull request to correct the bug. I suspect that one of the reasons that _.uniq() is so slow is because it supports an optional iterator function, and effectively must keep track of two lists: one with the values modified by the iterator, and the other with the results from the original.

Incidentally, you can make your delDups function even more concise, if you want:

function delDups(arr, bol) {
  var i = 0, len = arr.length >>> 0, prv, rtn = [];
  if (len) {
    for (; i < len; i++) {
      if (i in arr && ((len < 3 || bol ? arr[i] !== prv || !rtn.length : !inArray(arr[i], rtn, true))) {
        rtn.push(prv = arr[i]);
      }
    }
  }
  return rtn;
}
ajax333221 commented 12 years ago

@kitcambridge it looks very neat.

ghost commented 12 years ago

@ajax333221 Whoops! Sorry about that; nice catch!

jdalton commented 12 years ago

@kitcambridge The appropriate response is:

Did I do that?
ghost commented 12 years ago

@jdalton Hangs head in shame.

jashkenas commented 12 years ago

Whoops -- wrong ticket number.

jashkenas commented 12 years ago

Merged the fix, though, earlier.