tspence / csharp-csv-reader

A lightweight, high performance, zero dependency, streaming CSV reading library for CSharp.
http://tedspence.com
Apache License 2.0
59 stars 18 forks source link

Infinite loop while reading file #68

Open AlainBartmanDilaw opened 1 month ago

AlainBartmanDilaw commented 1 month ago

Custom.csv

The following code enters in an infinite loop on line

var lines = outlines.ToList();

Custom.csv file has been added.

Uncomment line with BufferSize makes the code run perfectly.

Loop has been detected so far at CSVFile Line 58

using CSVFile;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

namespace CSVFileLoader
{
    internal class Program
    {
        static void Main(string[] args)
        {
            var filename = @"Custom.csv";
            FileInfo f = new FileInfo(filename);

            Console.WriteLine($"File size = {f.Length}");

            CSVSettings settings = new CSVSettings
            {
                LineSeparator = "\n",
                FieldDelimiter = '\t',
                //BufferSize = (int)f.Length
            };

            var csv = CSVReader.FromFile(filename, settings);

            var outlines = csv.Lines();
            var lines = outlines.ToList();
            Console.WriteLine($"Real number of lines = {lines.Count}");

            // Count amount fields by line
            var fieldCountDictionary = new Dictionary<int, int>();
            foreach (var line in lines)
            {
                int fieldCount = line.Length; // Amount of fields
                if (fieldCountDictionary.ContainsKey(fieldCount))
                {
                    fieldCountDictionary[fieldCount]++;
                }
                else
                {
                    fieldCountDictionary[fieldCount] = 1;
                }
            }

            // Show number of rows by number of fields
            foreach (var kvp in fieldCountDictionary)
            {
                Console.WriteLine($"There are {kvp.Value} rows with {kvp.Key} fields");
            }
        }
    }
}
AlainBartmanDilaw commented 4 weeks ago

For information, issue happens while reading line 1474 of Custom.csv file Record uses 3 lines, but I don't think it's the reason why, because setting BufferSize to file Length bypass the issue.

tspence commented 2 days ago

Thanks for this report! Checking. I've seen some weirdness around buffer sizes straddling UTF-8 characters, this may be one of those problems.