joelverhagen / NCsvPerf

A test bench for various .NET CSV parsing libraries
https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers
MIT License
71 stars 14 forks source link

Add ML.NET MLContext.Data.LoadFromTextFile<T> #5

Closed jzabroski closed 3 years ago

jzabroski commented 3 years ago

https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/load-data-ml-net#load-data-from-a-single-file

public class HousingData
{
    [LoadColumn(0)]
    public float Size { get; set; }

    [LoadColumn(1, 3)]
    [VectorType(3)]
    public float[] HistoricalPrices { get; set; }

    [LoadColumn(4)]
    [ColumnName("Label")]
    public float CurrentPrice { get; set; }
}
//Create MLContext
MLContext mlContext = new MLContext();

//Load Data
IDataView data = mlContext.Data.LoadFromTextFile<HousingData>("my-data-file.csv", separatorChar: ',', hasHeader: true);
jzabroski commented 3 years ago

It would be cool to see how Microsoft's machine learning code performs here.

joelverhagen commented 3 years ago

I played around with this one a bit and I agree it would be awesome to compare the performance. However, my current benchmark relies on the parsing library to give each row's data in unmapped form (e.g. string[]). It looks like this library does have that granular of an API -- or at least not available in the public surface area.

This would be a cool library to test if and when I add a higher level "data mapping" benchmark. I'll leave this issue open for that time.

jzabroski commented 3 years ago

Seems do-able to add that benchmark. I'll give it a shot if that's ok with you.

joelverhagen commented 3 years ago

Done with https://github.com/joelverhagen/NCsvPerf/pull/38.