kevin-montrose / Cesil

Modern CSV (De)Serializer
MIT License
65 stars 3 forks source link

Consider defaulting RowEnding to Detect #42

Closed MarkPflug closed 3 years ago

MarkPflug commented 3 years ago

I was experimenting with Cesil after reading the performance blog post you wrote recently. It took me a while (longer than I'd like to admit) to realize that the reason that EnumerateAll was not producing any results was because the file I was trying to read used '\n' for newlines, but Options defaults to CarriageReturnLineFeed. For the sanity of future users, you might consider changing the default to Detect instead.

The dataset I was using was the JHU covid-19 dataset available here I was trying to bind it to the following class:

    public class CovidRecord
    {
        public int UID { get; set; }
        public string iso2 { get; set; }
        public string iso3 { get; set; }
        public int? code3 { get; set; }
        public float? FIPS { get; set; }
        public string Admin2 { get; set; }
        public string Province_State { get; set; }
        public string Country_Region { get; set; }
        public float? Lat { get; set; }
        public float? Long_ { get; set; }
        public string Combined_Key { get; set; }
    }

I have also authored a CSV library for .NET: Sylvan.Data.Csv. My focus was primarily performance, and I've put together some benchmarks for a handful of CSV libraries I could find. Sylvan exposes csv data a DbDataReader only, and doesn't provide any object binding capabilities, so comparing it to Cesil wouldn't exactly be apples to apples; as at least as far as I could tell Cesil only allows binding to objects.

kevin-montrose commented 3 years ago

Yeah, I've gone back and forth on defaults for RowEnding - technically it costs a bit more to do detect, but practically it seems like a common enough stumbling block to change.

There's one bit of subtlty, right now RowEnding is shared between read & write - but writing "Detect" makes no sense, so first I'd need to split RowEnding into two different options.

I'll try and get that in before the next update - though in the immediate future I'm mostly going to be working on source generator support.

MarkPflug commented 3 years ago

Writing "detect" could make sense if you interpret it to mean "platform default", so \r\n on windows and \n otherwise? That would probably match most users expectations too.