joelverhagen / NCsvPerf

A test bench for various .NET CSV parsing libraries
https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers
MIT License
71 stars 14 forks source link

ReadDelimited which I think is fast #1

Closed AnthonyLloyd closed 3 years ago

AnthonyLloyd commented 3 years ago
public static IEnumerable<string[]> ReadDelimited(TextReader reader, char delimiter)
{
    const int QUOTE = 34, SPACE = 32, LF = 10, CR = 13, EOF = -1;
    var data = new List<string>();
    var chars = new StringBuilder();
    int b;
    while ((b = reader.Read()) != EOF || data.Count != 0)
    {
        if (b == EOF || b == LF)
        {
            if (data.Count != 0 || chars.Length != 0)
            {
                for (b = chars.Length - 1; b >= 0; b--)
                    if (chars[b] != SPACE) break;
                data.Add(chars.ToString(0, b + 1));
                chars.Clear();
                yield return data.ToArray();
                data.Clear();
            }
        }
        else if (b == delimiter)
        {
            for (b = chars.Length - 1; b >= 0; b--)
                if (chars[b] != SPACE) break;
            data.Add(chars.ToString(0, b + 1));
            chars.Clear();
        }
        else if (b == QUOTE)
        {
            while ((b = reader.Read()) != EOF && (b != QUOTE || reader.Peek() == QUOTE))
            {
                if (b == QUOTE) reader.Read();
                if (b != SPACE || chars.Length != 0) chars.Append((char)b);
            }
        }
        else if (b != CR && (b != SPACE || chars.Length != 0)) chars.Append((char)b);
    }
}
joelverhagen commented 3 years ago

This looks awesome, I'll add it to the battery of tests. Thanks!

Is this your code? Is it available on nuget.org? I can drop it in by copying code and crediting you in this issue, as appropriate.

AnthonyLloyd commented 3 years ago

It's just something I've always used. Not NuGet yet. I may be interested in over optimising it a bit maybe on the buffering.

joelverhagen commented 3 years ago

Shouldn't the buffer in TextReader implementations (e.g. StreamReader) be enough? I've noticed other CSV implementations on top of TextReader have their own buffers which seems a bit redundant. But then again, I just tested the perf and haven't thought deeply about it 😅.

AnthonyLloyd commented 3 years ago

Benchmarking it looks like it's not so good. Twice as slow as string.Split. Not sure why. Makes you think you should keep your benchmarks around.