joelverhagen / NCsvPerf

A test bench for various .NET CSV parsing libraries
https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers
MIT License
69 stars 14 forks source link

Data Mapping benchmark #14

Open jzabroski opened 3 years ago

jzabroski commented 3 years ago

See #5 discussion - ML.NET doesn't have low-level API, so some Csv readers need a higher-level benchmark in order to participate.

JoshClose commented 3 years ago

I started implementing file helpers and found that it only outputs class objects. There is no way to get a string field out of each row.

@joelverhagen If you want to figure out how you would like to structure reading records, I could go in and add CsvHelper and FileHelpers to it (at the very least). Just need the template in source like you have for parsing.

jzabroski commented 3 years ago

Another thing I use a ton of with CsvHelper is this pattern where I log the row number in the CSV when importing it:

using System.Diagnostics.CodeAnalysis;
using CsvHelper.Configuration;
namespace Infra.Chat.Services.Importers.Models
{
    public class ChatLineMap : ClassMap<ChatLine>
    {
        [SuppressMessage("ReSharper", "VirtualMemberCallInConstructor")]
        public ChatLineMap()
        {
            Map(x => x.Id).Name("Id");
            Map(x => x.GroupId).Name("GroupId");
            Map(x => x.UserName).Name("UserName").TypeConverter<StringTypeConverter>();
            Map(x => x.Message).Name("Message").TypeConverter<StringTypeConverter>();
            Map(x => x.LastModified).Name("LastModified");
            Map(x => x.LineNumber).ConvertUsing(x => x.Context.Row); // Any high-level benchmark ideally has this as a thing
        }
    }
}

I realize this is specific but its a feature that makes a ton of sense. It would allow end-users to back out how the parser screwed up.