sestoft / C5

C5 generic collection library for C#/.NET
http://www.itu.dk/research/c5/
MIT License
1.03k stars 181 forks source link

Bug in ArrayBase<T> Makes ArrayList<T>.AddAll() Enter an Infinite Loop for Large Enumerables #45

Closed lundmikkel closed 7 years ago

lundmikkel commented 8 years ago

A subtle bug in ArrayBase<T> will cause an infinite loop when adding very large enumerables to array-based collections. The bug can be triggered by the following code:

var arrayList = new ArrayList<bool>();
var items = new bool[(1 << 30) + 1]; // 2^30 + 1 = 1,073,741,825
arrayList.AddAll(items);

The problem arises when the large array is added to the list. In order for ArrayList<T>, which inherits from ArrayBase<T>, to add the very large enumerable, it must first allocate an array big enough to contain all the enumerable's items. The expand() method in ArrayBase<T> is used to enlarge the underlying array:

protected virtual void expand(int newcapacity, int newsize)
{
    System.Diagnostics.Debug.Assert(newcapacity >= newsize);
    int newlength = array.Length;

     while (newlength < newcapacity) newlength *= 2;

     T[] newarray = new T[newlength];

     Array.Copy(array, newarray, newsize);
     array = newarray;
}

ArrayBase<T> stores its items in array, whose length is always a positive power of two. The method is called with the required capacity and the collection's current count – in the example above, newcapacity is 2^30 + 1 and newsize is 0. The method will take the current size of the array and double it until it is not less than the required capacity.

The problem arises when we double the integer newlength for long enough. When newlength is 2^30 and still less than newcapacity, it is doubled, giving an expected value of 2^31. The integer range in C#, however, is -2,147,483,648 (-2^31) to 2,147,483,647 (2^31 - 1), which causes newlength to overflow. Doubling the integer again makes it zero. The loop condition is now always true, since newlength remains zero even when doubled.

The easiest solution is probably to wrap the multiplication in checked:

protected virtual void expand(int newcapacity, int newsize)
{
    System.Diagnostics.Debug.Assert(newcapacity >= newsize);
    int newlength = array.Length;

     while (newlength < newcapacity)
         newlength = checked(newlength * 2);

     T[] newarray = new T[newlength];

     Array.Copy(array, newarray, newsize);
     array = newarray;
}

When the multiplication causes an overflow, an OverflowException is thrown. This does not fix the problem, but it does ensure that the problem does not go unnoticed.

ondfisk commented 8 years ago

Looks right - could you make a pull request fixing this? Thanks

abel406 commented 7 years ago

This issue is interesting. I have given it a thought and this looks in need of some rewriting. Maybe in the method that calls expand if we check that the new capacity is lower than int maximum value but greater than 1,073,741,824 (2^30) then the capacity should be increased by 100 000 and no multiplied by 2. Then if this new capacity exceeds 2,147,483,647 (2^31 - 1) then we should use an expand method that uses long instead of int. Dividing the actual method expand in two parts, check size and expand, where check size guarantees which expand to call the one with int and the one with long. This is just a thought.