qwertie / ecsharp

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.
http://ecsharp.net
Other
172 stars 25 forks source link

Fast bulk operations with Span<T> #88

Open qwertie opened 5 years ago

qwertie commented 5 years ago

A long time ago it bothered me that IEnumerator<T> required two interface calls per iteration, a design choice that would cause a small but pervasive performance reduction app-wide. (Long ago I introduced Iterator<T> and IIterable<T>, which were like IEnumerator and IEnumerable but required just one call per iteration - but as most infrastructure was built on IEnumerator, they didn't provide enough value so I were removed them in mid-2013.)

When I was designing LLLPG, I knew that lexer performance was important, and to achieve it I thought it would be important to be able to access most characters from the input stream without any virtual calls. On the other hand, I didn't want to require an entire source file to be stored as a single contiguous array - it should be possible to "stream" a large file from disk, or to tokenize an IDE's gap buffer. To resolve the tension between performance and flexibility I introduced ICharSource, whose Slice method returned a UString structure (it used to be StringSlice but never mind that) which represents part of a string (a tuple of string, start, length - like the new Memory<T>):

public interface ICharSource : IListSource<char>
{
    new UString Slice(int startIndex, int length);
}

The lexer then uses Slice to read small blocks (currently 512 characters), which are cached in the lexer. Since UString is a struct, accessing individual characters involves no virtual calls (though it did still require a range check). Note that UString itself implements ICharSource, and a string converts implicitly to UString so lexers can easily accept a string input. If you're reading data from a file you can use StreamCharSource instead which also implements ICharSource.

Let's consider a more general problem: bulk operations in which data is moving from one collection to another. You can easily implement methods like this on any collection type using a foreach loop:

void AddRange(IReadOnlyCollection<T> e);
void InsertRange(int index, IReadOnlyCollection<T> s);

You might be able to accelerate a list-insert operation by shifting elements after index only once (instead of once per item inserted). What you can't do is accelerate access to the input collection.

Okay, so what's the solution? Obviously, it's some sort of bulk read access based on arrays. Two approaches come to mind: either the sink collection / client code asks to read a certain-size block of data (probably a fixed size), or the source collection chooses a block size itself on a per-block basis. The second choice may be faster in some cases, because the source collection may be divided into nodes, each having an array it can return without copying. The first choice may have a performance advantage too, as the sink collection may be divided into nodes and only the sink knows how much space is available in each node. A compromise: the sink specifies a maximum size and the source provides something that large or less.

Hence:

    // I don't think the compiler will accept "out T" here but there must be some way to 
    // upcast, e.g. using an extension method that returns a decorator...
    public interface IScanner<out T>
    {
        // Returns an empty span (only) when the end of collection is reached
        ReadOnlySpan<T> Read(int maxSize = int.MaxValue);
    }
    public interface IMScanner<T> : IScanner<T> // mutable version
    {
        // When the caller calls Read(), changes to the previous span are saved?
        Span<T> Read(int maxSize = int.MaxValue);
    }
    public interface IScannable<T>
    {
        IScanner<T> Scan();
    }
    public interface ICollectionEx<T> : ICollectionImpl<T>, IIsEmpty
    {
        void AddRange(IScannable<T> e);
    }
    ....

Here you can see I'm thinking of changing existing interfaces like ICollectionEx so that instead of supporting boring-old AddRange(IEnumerable<T>) it only supports AddRange(IScannable<T>). AddRange(IEnumerable<T>) cannot accelerate access very much, so there's little point in offering it directly on the interface (it may as well be an extension method).